[jira] [Comment Edited] (SPARK-26968) option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation

M. Le Bihan (JIRA) Mon, 25 Feb 2019 06:25:17 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776903#comment-16776903
 ]


M. Le Bihan edited comment on SPARK-26968 at 2/25/19 2:24 PM:
--------------------------------------------------------------

It's still a problem, 
 I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
 But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program 
cannot easily do default conversions.

 
{code:java}
"codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
"03142","LENAX",267,43{code}
This issue can be set as a regression if Univocity is unable to do it. Because 
before, it was possible. And the issue will be closed when this result could be 
reached again.

 

Don't close this issue too early please.

 

P.S. : Adding to that, I don't understand why databricks would keep previous 
CSV system, as it is shown here on master branch [on line 504 of this unit 
test|https://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scala]
 still using and checking the results of NON_NUMERIC especially,

and have been exchanged with _Univocity_ in spark_core or spark_sql, without 
checking that it keeps abilities to give all the same results than before ?


was (Author: mlebihan):
It's still a problem, 
 I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
 But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program 
cannot easily do default conversions.

 
{code:java}
"codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
"03142","LENAX",267,43{code}
This issue can be set as a regression if Univocity is unable to do it. Because 
before, it was possible. And the issue will be closed when this result could be 
reached again.

 

Don't close this issue too early please.

 

P.S. : Adding to that, I don't understand why databricks would keep previous 
CSV system, as it is shown here on master branch :

[https://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scala|https://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scalahttps://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scala]

with the unit test on line 504,

and have been exchanged with _Univocity_ in spark_core or spark_sql, without 
checking that it keeps abilities to give all the same results than before ?

> option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation
> ---------------------------------------------------------------------
>
>                 Key: SPARK-26968
>                 URL: https://issues.apache.org/jira/browse/SPARK-26968
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: M. Le Bihan
>            Priority: Minor
>
> I have a CSV to write that has that schema :
> {code:java}
> StructType s = schema.add("codeCommuneCR", StringType, false);
> s = s.add("nomCommuneCR", StringType, false);
> s = s.add("populationCR", IntegerType, false);
> s = s.add("resultatComptable", IntegerType, false);{code}
> If I don't provide an option "_quoteMode_" or even if I set it to 
> {{NON_NUMERIC}}, this way :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") 
> .option("quoteMode", "NON_NUMERIC").option("quote", "\"") 
> .csv("./target/out_200071470.csv");{code}
> the CSV written by {{Spark}} is this one :
> {code:java}
> codeCommuneCR,nomCommuneCR,populationCR,resultatComptable
> 03142,LENAX,267,43{code}
> If I set an option "_quoteAll_" instead, like that :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") 
> .option("quoteAll", true).option("quote", "\"") 
> .csv("./target/out_200071470.csv");{code}
> it generates :
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable" 
> "03142","LENAX","267","43"{code}
> It seems that the {{.option("quoteMode", "NON_NUMERIC")}} is broken. It 
> should generate:
>  
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
> "03142","LENAX",267,43
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26968) option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation

Reply via email to