[
https://issues.apache.org/jira/browse/SPARK-38331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-38331:
---------------------------------
Component/s: SQL
(was: Input/Output)
> csv parser exception when quote and escape are both double-quote and a value
> is just "," and column pruning enabled
> -------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-38331
> URL: https://issues.apache.org/jira/browse/SPARK-38331
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.2, 3.2.1
> Reporter: Christopher Auston
> Priority: Minor
>
> Workaround: disable column pruning.
> Example pyspark code (from Databricks):
> {noformat}
> import pyspark
> print(pyspark.version.__version__)
> # enable column pruning (reset default value)
> spark.conf.set('spark.sql.csv.parser.columnPruning.enabled', 'true')
> dbutils.fs.put(file='/tmp/example.csv',
> contents='''"col1","b4_comma","comma","col4"
> "","",",","x"
> ''', overwrite=True)
> df = spark.read.csv(
> path='/tmp/example.csv'
> ,inferSchema=True
> ,header=True
> ,escape='"'
> ,multiLine=True
> ,unescapedQuoteHandling='RAISE_ERROR'
> ,mode='FAILFAST'
> )
> ex = None
> try:
> df.select(df.col1,df.comma).take(1)
> except Exception as e:
> ex = e
>
> if ex:
> print('[pruning] Exception is raised if b4_comma is NOT selected')
>
> df.select(df.b4_comma, df.comma).take(1)
> print('[pruning] No exception if b4_comma is selected')
> ex = None
> try:
> df.count()
> except Exception as e:
> ex = e
>
> if ex:
> print('[pruning] Exception raised by count')
> print('\ndisabling pruning\n')
>
>
> # disable column pruning
> spark.conf.set('spark.sql.csv.parser.columnPruning.enabled', 'false')
> df.select(df.col1,df.comma).take(1)
> print('[no prune] No exception if b4_comma is NOT selected') {noformat}
>
> Output:
> {noformat}
> 3.1.2
> Wrote 47 bytes.
> [pruning] Exception is raised if b4_comma is NOT selected
> [pruning] No exception if b4_comma is selected
> [pruning] Exception raised by count
> disabling pruning
> [no prune] No exception if b4_comma is NOT selected {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]