[
https://issues.apache.org/jira/browse/HUDI-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486696#comment-17486696
]
Alexey Kudinkin commented on HUDI-1296:
---------------------------------------
Discussion on 02/03
{code:java}
1. Integrating w/ Spark DataSource
- FileFormat reading the HFile as a Base file
- What interface to MT form Spark DS?
- Key (encoded column, partition path, filename) and Value (stats,
filename, etc)
- Ideal interface to MT:
- Filename
- Already available in the payload
- Partition Path (?)
- We can decode on the fly (we have list of partitions, so we
can match it with the key)
- Col A stats
- Col B stats
- ...
2. Integrating w/ Data Skipping {code}
> Implement Spark DataSource using range metadata for file/partition pruning
> --------------------------------------------------------------------------
>
> Key: HUDI-1296
> URL: https://issues.apache.org/jira/browse/HUDI-1296
> Project: Apache Hudi
> Issue Type: Task
> Components: spark
> Affects Versions: 0.9.0
> Reporter: Vinoth Chandar
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Fix For: 0.11.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)