[
https://issues.apache.org/jira/browse/KUDU-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke reassigned KUDU-2490:
---------------------------------
Assignee: Grant Henke
> implement Kudu DataSourceV2 and related classes
> -----------------------------------------------
>
> Key: KUDU-2490
> URL: https://issues.apache.org/jira/browse/KUDU-2490
> Project: Kudu
> Issue Type: Improvement
> Components: spark
> Reporter: Andrew Wong
> Assignee: Grant Henke
> Priority: Major
> Labels: roadmap-candidate
>
> The current Kudu-Spark bindings implement a DefaultSource that extends a
> RelationProvider, which provides BaseRelations to Spark, which, as I
> understand it, are physical units of query execution and represent sets of
> rows. The Kudu BaseRelation (the KuduRelation) implements a couple of traits
> to fit into Spark: PrunedFilteredScan, which allows predicates to be pushed
> into Kudu, and InsertableRelation, which allows writes to be pushed into
> Kudu. An issue with these bindings is that, while they provide interfaces to
> insert/get data, they do not provide interfaces to push details to Spark that
> might be useful to optimizing a Kudu query.
> Among other things, this is inconvenient for all datasources that might want
> to take such optimizations into their own hands, and the Spark community
> appears to be making efforts in revamping their DataSource APIs in the form
> of DataSourceV2, and as it pertains to read support, the v2 DataSourceReader.
> This new world order provides a clear path towards implementing various
> optimizations that are currently unavailable with the current Spark bindings,
> without pushing changes to Spark itself.
> Of note, the v2 DataSourceReader can be extended with
> SupportsReportStatistics, which could allow Kudu to expose statistics to Kudu
> without having to rely on HMS (although pushing stats to HMS isn't an
> unreasonable approach either). More traits and details about the API can be
> found
> [here|https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.html].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)