[ 
https://issues.apache.org/jira/browse/SPARK-38331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38331:
---------------------------------
    Component/s: SQL
                     (was: Input/Output)

> csv parser exception when quote and escape are both double-quote and a value 
> is just "," and column pruning enabled
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-38331
>                 URL: https://issues.apache.org/jira/browse/SPARK-38331
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2, 3.2.1
>            Reporter: Christopher Auston
>            Priority: Minor
>
> Workaround: disable column pruning.
> Example pyspark code (from Databricks):
> {noformat}
> import pyspark
> print(pyspark.version.__version__)
> # enable column pruning (reset default value)
> spark.conf.set('spark.sql.csv.parser.columnPruning.enabled', 'true')
> dbutils.fs.put(file='/tmp/example.csv', 
> contents='''"col1","b4_comma","comma","col4"
> "","",",","x"
> ''', overwrite=True)
> df = spark.read.csv(
>     path='/tmp/example.csv'
>     ,inferSchema=True
>     ,header=True
>     ,escape='"'
>     ,multiLine=True
>     ,unescapedQuoteHandling='RAISE_ERROR'
>     ,mode='FAILFAST'
>     )
> ex = None
> try:
>     df.select(df.col1,df.comma).take(1)
> except Exception as e:
>     ex = e
>     
> if ex:
>     print('[pruning] Exception is raised if b4_comma is NOT selected')
>     
> df.select(df.b4_comma, df.comma).take(1)
> print('[pruning] No exception if b4_comma is selected')
> ex = None
> try:
>     df.count()
> except Exception as e:
>     ex = e
>     
> if ex:
>     print('[pruning] Exception raised by count')
> print('\ndisabling pruning\n')
>     
>     
> # disable column pruning
> spark.conf.set('spark.sql.csv.parser.columnPruning.enabled', 'false')
> df.select(df.col1,df.comma).take(1)
> print('[no prune] No exception if b4_comma is NOT selected') {noformat}
>  
> Output:
> {noformat}
> 3.1.2
> Wrote 47 bytes.
> [pruning] Exception is raised if b4_comma is NOT selected
> [pruning] No exception if b4_comma is selected
> [pruning] Exception raised by count
> disabling pruning
> [no prune] No exception if b4_comma is NOT selected {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to