[
https://issues.apache.org/jira/browse/CSV-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285409#comment-16285409
]
Gary Gregory edited comment on CSV-219 at 12/10/17 11:23 PM:
-------------------------------------------------------------
Our quoting seems off IMO. Why not simply do:
{noformat}
diff --git a/src/main/java/org/apache/commons/csv/CSVFormat.java
b/src/main/java/org/apache/commons/csv/CSVFormat.java
index 58948fd..dc7588b 100644
--- a/src/main/java/org/apache/commons/csv/CSVFormat.java
+++ b/src/main/java/org/apache/commons/csv/CSVFormat.java
@@ -1186,10 +1186,7 @@
} else {
char c = value.charAt(pos);
- // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA =
%x20-21 / %x23-2B / %x2D-7E
- if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B
&& c < 0x2D || c > 0x7E)) {
- quote = true;
- } else if (c <= COMMENT) {
+ if (c <= COMMENT) {
// Some other chars at the start of a value caused the
parser to fail, so for now
// encapsulate if we start in anything less than '#'. We
are being conservative
// by including the default comment char too.
diff --git a/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
b/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
index ae7aae2..5a09627 100644
--- a/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
+++ b/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
@@ -1033,11 +1033,20 @@
}
@Test
- public void testRfc4180QuoteSingleChar() throws IOException {
+ public void testDontQuoteEuroFirstChar() throws IOException {
final StringWriter sw = new StringWriter();
try (final CSVPrinter printer = new CSVPrinter(sw, CSVFormat.RFC4180))
{
printer.printRecord(EURO_CH, "Deux");
- assertEquals("\"" + EURO_CH + "\",Deux" + recordSeparator,
sw.toString());
+ assertEquals(EURO_CH + ",Deux" + recordSeparator, sw.toString());
+ }
+ }
+
+ @Test
+ public void testQuoteCommaFirstChar() throws IOException {
+ final StringWriter sw = new StringWriter();
+ try (final CSVPrinter printer = new CSVPrinter(sw, CSVFormat.RFC4180))
{
+ printer.printRecord(",");
+ assertEquals("\",\"" + recordSeparator, sw.toString());
}
}
{noformat}
I do not see why the first char in a record being not in TEXTDATA should quote
the first field.
Thoughts from other. With the above patch, all tests pass.
was (Author: garydgregory):
Our quoting seems off IMO. Why not simply do:
{noformat}
diff --git a/src/main/java/org/apache/commons/csv/CSVFormat.java
b/src/main/java/org/apache/commons/csv/CSVFormat.java
index 58948fd..dc7588b 100644
--- a/src/main/java/org/apache/commons/csv/CSVFormat.java
+++ b/src/main/java/org/apache/commons/csv/CSVFormat.java
@@ -1186,10 +1186,7 @@ public final class CSVFormat implements Serializable {
} else {
char c = value.charAt(pos);
- // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA =
%x20-21 / %x23-2B / %x2D-7E
- if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B
&& c < 0x2D || c > 0x7E)) {
- quote = true;
- } else if (c <= COMMENT) {
+ if (c <= COMMENT) {
// Some other chars at the start of a value caused the
parser to fail, so for now
// encapsulate if we start in anything less than '#'. We
are being conservative
// by including the default comment char too.
diff --git a/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
b/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
index ae7aae2..dde7c19 100644
--- a/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
+++ b/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
@@ -1037,7 +1037,7 @@ public class CSVPrinterTest {
final StringWriter sw = new StringWriter();
try (final CSVPrinter printer = new CSVPrinter(sw, CSVFormat.RFC4180))
{
printer.printRecord(EURO_CH, "Deux");
- assertEquals("\"" + EURO_CH + "\",Deux" + recordSeparator,
sw.toString());
+ assertEquals(EURO_CH + ",Deux" + recordSeparator, sw.toString());
}
}
{noformat}
I do not see why the first char in a record being not in TEXTDATA should quote
the first field.
Thoughts from other. With the above patch, all tests pass.
> The behavior of quote char using is not similar as Excel does when the first
> string contains CJK char(s)
> --------------------------------------------------------------------------------------------------------
>
> Key: CSV-219
> URL: https://issues.apache.org/jira/browse/CSV-219
> Project: Commons CSV
> Issue Type: Bug
> Components: Printer
> Affects Versions: 1.5
> Reporter: Zhang Hongda
> Attachments: diff.patch
>
>
> When using CSVFormat.EXCEL to print a CSV file, the behavior of quote char
> using is not similar as Microsoft Excel does when the first string contains
> Chinese, Japanese or Korean (CJK) char(s).
> e.g.
> There are 3 data members in a record, with Japanese chars: "あ", "い", "う":
> Microsoft Excel outputs:
> あ,い,う
> Apache Common CSV outputs:
> "あ",い,う
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)