[ 
https://issues.apache.org/jira/browse/HUDI-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377431#comment-17377431
 ] 

Saravanan commented on HUDI-55:
-------------------------------

It is crucial to have the bucketing feature to reduce the time to search for 
the requested data either for retrieval or to update back. Without bucket I/O 
is directly proportional to the number of files present within a partition and 
for a table which has no partition, its like doing a full table scan. The 
adoption of Hudi would increase further if we enable this feature as the time 
to access the data stands as the decision factor while choosing a stack. We 
understand that only Hudi and Delta lake provides update capability without the 
overhead of involving a no-sql database and to merge things on a day to day 
basis. But then that merge operation would be one time per day for a batch 
workload specific, followed by performant querying n-number of times. Adoption 
of new stack has more influence from end user perspective as the turn around 
time should be less compared to operational team on query basis. 

> Investigate support for bucketed tables ala Hive #74
> ----------------------------------------------------
>
>                 Key: HUDI-55
>                 URL: https://issues.apache.org/jira/browse/HUDI-55
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Hive Integration
>            Reporter: Vinoth Chandar
>            Priority: Major
>
> https://github.com/uber/hudi/issues/74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to