zhoulii commented on pull request #19152: URL: https://github.com/apache/flink/pull/19152#issuecomment-1074936494
> I found that after changing to new source, tpcds runs slower than before. This is probably mainly because the new csv source is slower than the legacy `CsvTableSource`. It only took 20 min before, and now it takes 30 min. I think we need to wait for [FLINK-26760](https://issues.apache.org/jira/browse/FLINK-26760) to have a conclusion before merging this pr. I agree. The way that parsing csv data between [CsvInputFormat.java#L87 which legacy csv source used](https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java#L87) and [CsvReaderFormat.java#L193 which new csv source used](https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvReaderFormat.java#L193) is quite different, may be we can reuse the parse method of CsvInputFormat in CsvReaderFormat. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
