I created FLINK-1168 for this feature request. 2014-10-16 11:28 GMT+02:00 Fabian Hueske <[email protected]>:
> I don't think, that multi-char field delimiters would cause a performance > problem. The data needs to be parsed anyway. > Only in cases where the delimiter has a prefix that occurs often in the > regular data, it could have a major impact. > > Fabian > > 2014-10-15 16:07 GMT+02:00 Martin Neumann <[email protected]>: > >> Would changing it cost performance? >> If not I thing it would be a good change to make since it allows to >> (ab)use >> the csv reader to load structured Text files (for example by putting >> Keywords as delimiter). >> >> Being able to put a regular expression there would be even nicer but maybe >> it should end up in its own InputFormat then. >> >> cheers Martin >> >> On Wed, Oct 15, 2014 at 3:47 PM, Stephan Ewen <[email protected]> wrote: >> >> > Hi! >> > >> > The reason is the current way the csv parsers work. They are pushed into >> > the byte stream parsing and are restricted to recognize one char >> > delimiters. It is possible to change that, but would be a bit of work. >> > >> > Stephan >> > >> > On Wed, Oct 15, 2014 at 3:36 PM, Martin Neumann <[email protected]> >> > wrote: >> > >> > > Hej, >> > > >> > > A lot of my inputs are csv files so I use the CsvInputFormat a lot. >> What >> > I >> > > find kind of odd that the Line delimiter is a String but the Field >> > > delimiter is a Character. >> > > >> > > *see:* new CsvInputFormat<Tuple2<String,String>>(new >> > > Path(pVecPath),"\n",'\t',String.class,String.class) >> > > >> > > Is there a reason for this? I'm currently working with a file that >> has a >> > > more complex field delimiter so I had to write a mapper to read from >> > > StringInputFormat. >> > > >> > > cheers Martin >> > > >> > >> > >
