[
https://issues.apache.org/jira/browse/CSV-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105438#comment-15105438
]
cornel creanga commented on CSV-168:
------------------------------------
There are more rules, I'll copy paste below some sentences from the MYSQL
manual.
{quote}
"Handling of NULL values varies according to the FIELDS and LINES options in
use:
For the default FIELDS and LINES values, NULL is written as a field value of \N
for output, and a field value of \N is read as NULL for input (assuming that
the ESCAPED BY character is “\”).
If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as
its value is read as a NULL value. This differs from the word NULL enclosed
within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.
If FIELDS ESCAPED BY is empty, NULL is written as the word NULL.
With fixed-row format (which is used when FIELDS TERMINATED BY and FIELDS
ENCLOSED BY are both empty), NULL is written as an empty string. This causes
both NULL values and empty strings in the table to be indistinguishable when
written to the file because both are written as empty strings. If you need to
be able to tell the two apart when reading the file back in, you should not use
fixed-row format." - http://dev.mysql.com/doc/refman/5.7/en/load-data.html
{quote}
Briefly:
* if you configure a value for FIELDS ENCLOSED BY you can use the word NULL if
you want. If MYSQL will find the actual "NULL" string it will enclose it in
order to distinguish it from the NULL value
* if you do not configure a value for FIELDS ENCLOSED BY (default behaviour)
the null value will be written as <separator>N (and the default separator is
"\"). If MYSQL will find the actual "<separator>N" string it will escape it to
"<separator><separator>N":
Regarding your concern: at the first glance I do not know if somebody would
like to have it escaped - in this case how to distinguish during a read
operation between the actual text '\N' and the null value? - however I might be
wrong and miss some use cases. Adding an escapeNull flag could work too.
Note that right now even if you want to pursue the other choice (quote the
values and use NULL for representing the values) you would run into issues
because the CSVParser will quote the NULL too (eg try a test with CSVFormat
format =
CSVFormat.MYSQL.withQuote('"').withNullString("NULL").withQuoteMode(QuoteMode.NON_NUMERIC)
.
I've attached some code too:
{code}
String[] s = new String[]{"NULL",null};
CSVFormat format =
CSVFormat.MYSQL.withQuote('"').withNullString("NULL").withQuoteMode(QuoteMode.NON_NUMERIC);
StringWriter writer = new StringWriter();
CSVPrinter printer = new CSVPrinter(writer, format);
printer.print(s[0]);printer.print(s[1]);printer.close();
System.out.println(writer.toString());
s = new String[]{"\\N",null};
format = CSVFormat.MYSQL.withNullString("\\N");
writer = new StringWriter();
printer = new CSVPrinter(writer, format);
printer.print(s[0]);printer.print(s[1]);printer.close();
System.out.println(writer.toString());
s = new String[]{"",null};
format = CSVFormat.MYSQL;
writer = new StringWriter();
printer = new CSVPrinter(writer, format);
printer.print(s[0]);printer.print(s[1]);printer.close();
System.out.println(writer.toString());
{code}
> CsvFormat.nullString should not be escaped
> ------------------------------------------
>
> Key: CSV-168
> URL: https://issues.apache.org/jira/browse/CSV-168
> Project: Commons CSV
> Issue Type: Bug
> Components: Parser
> Affects Versions: 1.2
> Reporter: cornel creanga
> Priority: Minor
>
> Hello,
> Use case: I'm generating MySQL dump files (text format) - for more details
> check this - http://dev.mysql.com/doc/refman/5.7/en/select-into.html.
> Issue: The value null is represented as "\N". Also by default the escape char
> is '\N'. The CsvPrinter.printAndEscape method will convert this value into
> "\\N".
> I suggest to modify the CsvPrinter in order to not escape the nullString
> value - it should be written as it is. I can create a pull request if you
> want.
> I consider it a minor issue because it can be mitigated by making sure that
> the escape character is not a part of the nullString - however in my case it
> means that the LOAD commands should be modified accordingly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)