[ 
https://issues.apache.org/jira/browse/HUDI-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246169#comment-17246169
 ] 

Udit Mehrotra commented on HUDI-1401:
-------------------------------------

[~vinoth] I agree that this is not something that would be feasible to keep a 
track off and keep updated, and we should explore the path of not setting block 
locations when using Hudi's metadata listing feature.

In case of S3, EmrFS does this:
{code:java}
BlockLocation[] locations = getFileBlockLocations(status, 0, 
status.getLen());{code}
This would reuse 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L866]
 and basically create one block with the entire file size.

I did an initial version of the presto implementation to use metadata file 
listing. Not setting any blocks throws an error at 
[https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/util/InternalHiveSplitFactory.java#L218].
 Setting that single block works for S3. I need to do more scale testing for 
S3, and will also test HDFS with the same approach.

 

> Presto use of Metadata Table for file listings
> ----------------------------------------------
>
>                 Key: HUDI-1401
>                 URL: https://issues.apache.org/jira/browse/HUDI-1401
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: Presto Integration
>            Reporter: Vinoth Chandar
>            Assignee: Udit Mehrotra
>            Priority: Blocker
>             Fix For: 0.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to