[ 
https://issues.apache.org/jira/browse/HUDI-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486696#comment-17486696
 ] 

Alexey Kudinkin commented on HUDI-1296:
---------------------------------------

Discussion on 02/03
{code:java}
1. Integrating w/ Spark DataSource
    - FileFormat reading the HFile as a Base file
    - What interface to MT form Spark DS?
        - Key (encoded column, partition path, filename) and Value (stats, 
filename, etc)
        - Ideal interface to MT:
             - Filename
               - Already available in the payload
             - Partition Path (?)
                - We can decode on the fly (we have list of partitions, so we 
can match it with the key)
             - Col A stats
             - Col B stats
             - ...

2. Integrating w/ Data Skipping {code}

> Implement Spark DataSource using range metadata for file/partition pruning
> --------------------------------------------------------------------------
>
>                 Key: HUDI-1296
>                 URL: https://issues.apache.org/jira/browse/HUDI-1296
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 0.9.0
>            Reporter: Vinoth Chandar
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>             Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to