On 14 March 2012 22:54, Emmanuel Bourg <ebo...@apache.org> wrote: > Le 14/03/2012 23:35, sebb a écrit : > > >> It's not too late to change to using Character. > > > The issue is the parser, this will probably degrade the performance. Unless > the fields are made package private and accessed directly by the parser.
I don't see why, so long as you fetch the values once at the start. There's a theoretical problem with using a valid char value as a disabled indicator, at least with the parser as it stands. It assumes that the disabled char cannot occur in a file; that is not strictly true, so it could detect an escape where there is none. The example string "pu\\ufffeblic" is parsed as pu<BEL>lic when using CSVFormat.TDF.withUnicodeEscapesInterpreted(true) - i.e. the disabled char is treated as an escape, and \b = <BEL>. This could be avoided by checking isEscaping in the parser; similarly for the other chars that can be disabled. > > >> It's not possible currently to create a format with encapsulator, >> commentStart and escape all null, except by knowing the value of >> DISABLED. > > > The solution might be to introduce a quoting mode to the format. I planned > to add this for the printer, but it can be useful for the parser too. This > mode would have 3 states: > - NEVER: Don't use quotes, even if the encapsulator is defined I think it would be confusing to have a format with an encapsulator that is not used. > - ALWAYS: Always enclose values into quotes > - REQUIRED: Enclose the values only if necessary > > Thus the quotes could be disabled with: > > CSVFormat.DEFAULT.withEncapsulation(NEVER) Or provide a constant format with all 3 disabled. But it would still be simpler to be able to override each and every char independently; using Character is going to be the simplest way to achieve that. Probably still need to make output encapsulation switchable, but that's a different matter. > > >> Likewise, encapsulator and escape are not mutually exclusive. >> Should they be? > > > I wish it was, but MySQL actually produces files with quotes and escaped > characters (delimiter and line feeds). I'm reviewing other RDBMS to see what > formats we can expect in the wild. OK. > Emmanuel Bourg > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org