Re: CsvInputFormat delimiter fields

Fabian Hueske Thu, 16 Oct 2014 03:25:42 -0700

I created FLINK-1168 for this feature request.

2014-10-16 11:28 GMT+02:00 Fabian Hueske <[email protected]>:


> I don't think, that multi-char field delimiters would cause a performance
> problem. The data needs to be parsed anyway.
> Only in cases where the delimiter has a prefix that occurs often in the
> regular data, it could have a major impact.
>
> Fabian
>
> 2014-10-15 16:07 GMT+02:00 Martin Neumann <[email protected]>:
>
>> Would changing it cost performance?
>> If not I thing it would be a good change to make since it allows to
>> (ab)use
>> the csv reader to load structured Text files (for example by putting
>> Keywords as delimiter).
>>
>> Being able to put a regular expression there would be even nicer but maybe
>> it should end up in its own InputFormat then.
>>
>> cheers Martin
>>
>> On Wed, Oct 15, 2014 at 3:47 PM, Stephan Ewen <[email protected]> wrote:
>>
>> > Hi!
>> >
>> > The reason is the current way the csv parsers work. They are pushed into
>> > the byte stream parsing and are restricted to recognize one char
>> > delimiters. It is possible to change that, but would be a bit of work.
>> >
>> > Stephan
>> >
>> > On Wed, Oct 15, 2014 at 3:36 PM, Martin Neumann <[email protected]>
>> > wrote:
>> >
>> > > Hej,
>> > >
>> > > A lot of my inputs are csv files so I use the CsvInputFormat a lot.
>> What
>> > I
>> > > find kind of odd that the Line delimiter is a String but the Field
>> > > delimiter is a Character.
>> > >
>> > > *see:* new CsvInputFormat<Tuple2<String,String>>(new
>> > > Path(pVecPath),"\n",'\t',String.class,String.class)
>> > >
>> > > Is there a reason for this? I'm currently working with a file that
>> has a
>> > > more complex field delimiter so I had to write a mapper to read from
>> > > StringInputFormat.
>> > >
>> > > cheers Martin
>> > >
>> >
>>
>
>

Re: CsvInputFormat delimiter fields

Reply via email to