If you already have an idea on how to proceed maybe I can try to take care of issue a PR using commons-csv or whatever library you prefer
On 10 Mar 2017 22:07, "Fabian Hueske" <fhue...@gmail.com> wrote: Hi Flavio, Flink's CsvInputFormat was originally meant to be an efficient way to parse structured text files and dates back to the very early days of the project (probably 2011 or so). It was never meant to be compliant with the RFC specification and initially didn't support many features like quoting, quote escaping, etc. Some of these were later added but others not. I agree that the requirements for the CsvInputFormat have changed as more people are using the project and that a standard compliant parser would be desirable. We could definitely look into using an existing library for the parsing, but it would still need to be integrated with the way that Flink's InputFormats work. For instance, you're approach isn't standard compliant either, because TextInputFormat is not aware of quotes and would break records with quoted record delimiters (FLINK-6016 [1]). I would be OK with having a less efficient format which is not based on the current implementation but which is standard compliant. IMO that would be a very useful contribution. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-6016 2017-03-10 11:28 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > Hi to all, > I want to discuss with the dev group something about CSV parsing. > Since I started using Flink with CSVs I always faced some little problem > here and there and the new tickets about the CSV parsing seems to confirm > that this part is still problematic. > In my production jobs I gave up using Flink CSV parsing in favour of apace > commons-csv and it works great. It's perfectly configurable ans robust. > A working example is available at [1]. > > Thus, why not to use that library directly and contribute back (if needed) > to another apache library if improvements are required to speed up the > parsing? Have you ever tried to compare the performances of the 2 parsers? > > Best, > Flavio > > [1] > https://github.com/okkam-it/flink-examples/blob/master/ > src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/ > Csv2RowExample.java >