On 14 March 2012 22:54, Emmanuel Bourg <ebo...@apache.org> wrote:
> Le 14/03/2012 23:35, sebb a écrit :
>
>
>> It's not too late to change to using Character.
>
>
> The issue is the parser, this will probably degrade the performance. Unless
> the fields are made package private and accessed directly by the parser.

I don't see why, so long as you fetch the values once at the start.

There's a theoretical problem with using a valid char value as a
disabled indicator, at least with the parser as it stands.
It assumes that the disabled char cannot occur in a file; that is not
strictly true, so it could detect an escape where there is none.

The example string "pu\\ufffeblic" is parsed as pu<BEL>lic when using
CSVFormat.TDF.withUnicodeEscapesInterpreted(true) - i.e. the disabled
char is treated as an escape, and \b = <BEL>.

This could be avoided by checking isEscaping in the parser; similarly
for the other chars that can be disabled.

>
>
>> It's not possible currently to create a format with encapsulator,
>> commentStart and escape all null, except by knowing the value of
>> DISABLED.
>
>
> The solution might be to introduce a quoting mode to the format. I planned
> to add this for the printer, but it can be useful for the parser too. This
> mode would have 3 states:
> - NEVER: Don't use quotes, even if the encapsulator is defined

I think it would be confusing to have a format with an encapsulator
that is not used.

> - ALWAYS: Always enclose values into quotes
> - REQUIRED: Enclose the values only if necessary
>
> Thus the quotes could be disabled with:
>
> CSVFormat.DEFAULT.withEncapsulation(NEVER)

Or provide a constant format with all 3 disabled.

But it would still be simpler to be able to override each and every
char independently; using Character is going to be the simplest way to
achieve that.

Probably still need to make output encapsulation switchable, but
that's a different matter.

>
>
>> Likewise, encapsulator and escape are not mutually exclusive.
>> Should they be?
>
>
> I wish it was, but MySQL actually produces files with quotes and escaped
> characters (delimiter and line feeds). I'm reviewing other RDBMS to see what
> formats we can expect in the wild.

OK.

> Emmanuel Bourg
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to