[
https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-32364:
----------------------------------
Summary: Use CaseInsensitiveMap for DataFrameReader/Writer options (was:
`path` argument of DataFrame.load/save should override the existing options)
> Use CaseInsensitiveMap for DataFrameReader/Writer options
> ---------------------------------------------------------
>
> Key: SPARK-32364
> URL: https://issues.apache.org/jira/browse/SPARK-32364
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
> Reporter: Dongjoon Hyun
> Priority: Major
>
> Although we introduced CaseInsensitiveMap and CaseInsensitiveStringMap(in
> DSv2), when a user have multiple options like `path`, `paTH`, and `PATH` for
> the same key `path`, `option()/options()` are non-deterministic because
> `extraOptions` is `HashMap`. This issue aims to make load/save respect its
> direct path argument always and ignore the existing options. It's because
> that load/save function is independent from users' typos like `paTH` and is
> designed to be invoked at the last operation. So, load/save should work
> consistently and correctly always.
> Please note that this doesn't aim to enforce case-insensitivity to
> `option()/options()` or `extraOptions` variable because that might be
> considered as a behavior change.
> {code}
> spark.read
> .option("paTh", "1")
> .option("PATH", "2")
> .option("Path", "3")
> .option("patH", "4")
> .load("5")
> ...
> org.apache.spark.sql.AnalysisException:
> Path does not exist: file:/.../1;
> {code}
> Since Apache Spark uses `extraOptions.toMap`, `LinkedHashMap[String, String]`
> has the same issue.
> {code}
> val extraOptions = new scala.collection.mutable.LinkedHashMap[String, String]
> extraOptions += ("paTh" -> "1")
> extraOptions += ("PATH" -> "2")
> extraOptions += ("Path" -> "3")
> extraOptions += ("patH" -> "4")
> extraOptions += ("path" -> "5")
> extraOptions.toMap
> // Exiting paste mode, now interpreting.
> extraOptions: scala.collection.mutable.LinkedHashMap[String,String] =
> Map(paTh -> 1, PATH -> 2, Path -> 3, patH -> 4, path -> 5)
> res0: scala.collection.immutable.Map[String,String] = Map(PATH -> 2, path ->
> 5, patH -> 4, Path -> 3, paTh -> 1)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]