[
https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564362#comment-16564362
]
Ryan Blue edited comment on SPARK-24882 at 7/31/18 9:02 PM:
------------------------------------------------------------
{quote}the problem is then we need to make `CatalogSupport` a must-have for
data sources instead of an optional plugin
{quote}
Data sources are read and write implementations. Catalog support should be a
layer above read/write implementation that is used to provide CTAS and other
table-level support.
If you're interested in the anonymous table use case from the email discussion,
I posted a suggestion there to add an {{anonymousTable}} function to
{{DataSourceV2}}. That allows a source instantiated directly through v1-style
reflection to provide a {{Table}} based on an options map. Then that table
would implement {{ReadSupport}} and {{WriteSupport}} as I've suggested in this
thread. That would preserve the ability to instantiate a source directly and
use it, and would center around a {{Table}} that implements the read and write
traits.
An alternative to the {{anonymousTable}} method is what I did in the WIP pull
request for CTAS. In that PR, I created two ways to work with {{DataSourceV2}}:
through the existing {{DataSourceV2Relation}} and through a new
{{TableV2Relation}}. The first is for {{DataSourceV2}} instances that implement
the read and write traits, while the latter is for {{Table}} objects that
implement them. Either way works, though it would be cleaner to just use
{{Table}}.
Thanks for the builder update! Immutability is the most important part, but I'd
still prefer a builder interface with default methods instead of the mix-in
traits.
was (Author: rdblue):
{quote}the problem is then we need to make `CatalogSupport` a must-have for
data sources instead of an optional plugin
{quote}
Data sources are read and write implementations. Catalog support should be a
layer above read/write implementation that is used to provide CTAS and other
table-level support. If you're interested in the anonymous table use case from
the email discussion, I posted a suggestion there to add an {{anonymousTable}}
function to {{DataSourceV2}}. That allows a source instantiated directly
through v1-style reflection to provide a {{Table}} based on an options map.
Then that table would implement {{ReadSupport}} and {{WriteSupport}} as I've
suggested in this thread. That would preserve the ability to instantiate a
source directly and use it, and would center around a {{Table}} that implements
the read and write traits.
An alternative to the {{anonymousTable}} method is what I did in the WIP pull
request for CTAS. In that PR, I created two ways to work with {{DataSourceV2}}:
through the existing {{DataSourceV2Relation}} and through a new
{{TableV2Relation}}. The first is for {{DataSourceV2}} instances that implement
the read and write traits, while the latter is for {{Table}} objects that
implement them. Either way works, though it would be cleaner to just use
{{Table}}.
Thanks for the builder update! Immutability is the most important part, but I'd
still prefer a builder interface with default methods instead of the mix-in
traits.
> data source v2 API improvement
> ------------------------------
>
> Key: SPARK-24882
> URL: https://issues.apache.org/jira/browse/SPARK-24882
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Wenchen Fan
> Assignee: Wenchen Fan
> Priority: Major
>
> Data source V2 is out for a while, see the SPIP
> [here|https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit?usp=sharing].
> We have already migrated most of the built-in streaming data sources to the
> V2 API, and the file source migration is in progress. During the migration,
> we found several problems and want to address them before we stabilize the V2
> API.
> To solve these problems, we need to separate responsibilities in the data
> source v2 API, isolate the stateull part of the API, think of better naming
> of some interfaces. Details please see the attached google doc:
> https://docs.google.com/document/d/1DDXCTCrup4bKWByTalkXWgavcPdvur8a4eEu8x1BzPM/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]