Would changing it cost performance? If not I thing it would be a good change to make since it allows to (ab)use the csv reader to load structured Text files (for example by putting Keywords as delimiter).
Being able to put a regular expression there would be even nicer but maybe it should end up in its own InputFormat then. cheers Martin On Wed, Oct 15, 2014 at 3:47 PM, Stephan Ewen <[email protected]> wrote: > Hi! > > The reason is the current way the csv parsers work. They are pushed into > the byte stream parsing and are restricted to recognize one char > delimiters. It is possible to change that, but would be a bit of work. > > Stephan > > On Wed, Oct 15, 2014 at 3:36 PM, Martin Neumann <[email protected]> > wrote: > > > Hej, > > > > A lot of my inputs are csv files so I use the CsvInputFormat a lot. What > I > > find kind of odd that the Line delimiter is a String but the Field > > delimiter is a Character. > > > > *see:* new CsvInputFormat<Tuple2<String,String>>(new > > Path(pVecPath),"\n",'\t',String.class,String.class) > > > > Is there a reason for this? I'm currently working with a file that has a > > more complex field delimiter so I had to write a mapper to read from > > StringInputFormat. > > > > cheers Martin > > >
