[
https://issues.apache.org/jira/browse/SPARK-43155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-43155:
---------------------------------
Fix Version/s: (was: 3.5.0)
> DataSourceV2 is hard to be implemented without following V1
> -----------------------------------------------------------
>
> Key: SPARK-43155
> URL: https://issues.apache.org/jira/browse/SPARK-43155
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: PEIYUAN SUN
> Priority: Major
> Labels: features
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> h1. Description
> The current interface of DataSourceV2 becomes overly complicated than the
> Spark 2.x versions. To implement under the DataSourceV2, user needs to learn
> not only the V2 APIs and interfaces. But also the DataSourceV1 (as it is a
> failback version).
> h2. Interface Gaps
> There is no easy way and clear examples on how to implement both for a new
> dataSource. For example, the examples in standard spark repo like orc,
> parquet, json has a FileFormat interface for V1 while all these are not
> feasible to be followed since the SPI is hard-code as `DefaultSource` instead
> of dynamic loading if from user provided class outside the Spark Repo.
> Different data sources are not strictly following a same pattern in V1 and
> not decoupled well with customized logic within it.
>
> h2. Loss of simple layer over different kinds of dataSource
> With original V1, user can actually implement a new wrapper on top of
> orc/parquet easily with Relation Interface. The DataSourceV2 again here
> becomes too low level and hard to be used in this case.
>
> h2. No explicit guidance
> The functionality interfaces are not well organized which forces the reader
> spend lots of time to understand the commit history, existing patterns as
> well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]