[
https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562114#comment-16562114
]
Wenchen Fan commented on SPARK-24882:
-------------------------------------
[~rdblue] I do agree that creating a catalog via reflection and then using
catalog to create a `ReadSupport` instance is cleaner. But the problem is then
we need to make `CatatalogSupport` a must-have for data sources instead of an
optional plugin. How about we rename the old `ReadSupport` to
`ReadSupportProvider` for data sources that don't have a catalog? It works like
a dynamic constructor of `ReadSupport` so that Spark can create `ReadSupport`
by reflection.
For the builder issue, I'm ok with adding a `ScanConfigBuilder` that is mutable
and can mix in the `SupportsPushdownXYZ` traits, to make `ScanConfig`
immutable. I think this model is simpler: The `ScanConfigBuilder` tracks all
the pushed operators, checks the current status and gives feedback to Spark
about the next operator pushdown. We can design a pure builder-like pushdown
API for `ScanConfigBuilder` later. We need to support more operators pushdown
to evaluate the design, so it seems safer to keep the pushdown API unchanged
for now. What do you think?
> separate responsibilities of the data source v2 read API
> --------------------------------------------------------
>
> Key: SPARK-24882
> URL: https://issues.apache.org/jira/browse/SPARK-24882
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Wenchen Fan
> Assignee: Wenchen Fan
> Priority: Major
>
> Data source V2 is out for a while, see the SPIP
> [here|https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit?usp=sharing].
> We have already migrated most of the built-in streaming data sources to the
> V2 API, and the file source migration is in progress. During the migration,
> we found several problems and want to address them before we stabilize the V2
> API.
> To solve these problems, we need to separate responsibilities in the data
> source v2 read API. Details please see the attached google doc:
> https://docs.google.com/document/d/1DDXCTCrup4bKWByTalkXWgavcPdvur8a4eEu8x1BzPM/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]