[
https://issues.apache.org/jira/browse/HUDI-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377431#comment-17377431
]
Saravanan commented on HUDI-55:
-------------------------------
It is crucial to have the bucketing feature to reduce the time to search for
the requested data either for retrieval or to update back. Without bucket I/O
is directly proportional to the number of files present within a partition and
for a table which has no partition, its like doing a full table scan. The
adoption of Hudi would increase further if we enable this feature as the time
to access the data stands as the decision factor while choosing a stack. We
understand that only Hudi and Delta lake provides update capability without the
overhead of involving a no-sql database and to merge things on a day to day
basis. But then that merge operation would be one time per day for a batch
workload specific, followed by performant querying n-number of times. Adoption
of new stack has more influence from end user perspective as the turn around
time should be less compared to operational team on query basis.
> Investigate support for bucketed tables ala Hive #74
> ----------------------------------------------------
>
> Key: HUDI-55
> URL: https://issues.apache.org/jira/browse/HUDI-55
> Project: Apache Hudi
> Issue Type: New Feature
> Components: Hive Integration
> Reporter: Vinoth Chandar
> Priority: Major
>
> https://github.com/uber/hudi/issues/74
--
This message was sent by Atlassian Jira
(v8.3.4#803005)