[ 
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32364:
----------------------------------
    Summary: Use CaseInsensitiveMap for DataFrameReader/Writer options  (was: 
`path` argument of DataFrame.load/save should override the existing options)

> Use CaseInsensitiveMap for DataFrameReader/Writer options
> ---------------------------------------------------------
>
>                 Key: SPARK-32364
>                 URL: https://issues.apache.org/jira/browse/SPARK-32364
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Major
>
> Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in 
> DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for 
> the same key `path`, `option()/options()` are non-deterministic because 
> `extraOptions` is `HashMap`. This issue aims to make load/save respect its 
> direct path argument always and ignore the existing options. It's because 
> that load/save function is independent from users' typos like `paTH` and is 
> designed to be invoked at the last operation. So, load/save should work 
> consistently and correctly always.
> Please note that this doesn't aim to enforce case-insensitivity to 
> `option()/options()` or `extraOptions` variable because that might be 
> considered as a behavior change.
> {code}
> spark.read
>   .option("paTh", "1")
>   .option("PATH", "2")
>   .option("Path", "3")
>   .option("patH", "4")
>   .load("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../1;
> {code}
> Since Apache Spark uses `extraOptions.toMap`, `LinkedHashMap[String, String]` 
> has the same issue.
> {code}
> val extraOptions = new scala.collection.mutable.LinkedHashMap[String, String]
> extraOptions += ("paTh" -> "1")
> extraOptions += ("PATH" -> "2")
> extraOptions += ("Path" -> "3")
> extraOptions += ("patH" -> "4")
> extraOptions += ("path" -> "5")
> extraOptions.toMap
> // Exiting paste mode, now interpreting.
> extraOptions: scala.collection.mutable.LinkedHashMap[String,String] = 
> Map(paTh -> 1, PATH -> 2, Path -> 3, patH -> 4, path -> 5)
> res0: scala.collection.immutable.Map[String,String] = Map(PATH -> 2, path -> 
> 5, patH -> 4, Path -> 3, paTh -> 1)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to