[ 
https://issues.apache.org/jira/browse/HUDI-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237775#comment-17237775
 ] 

Nishith Agarwal commented on HUDI-55:
-------------------------------------

Blurb from slack channel : 

```
I have a requirement to compact datalake but need bucketing on top of 
compaction so that during query time, only the files relevant to the "id" in 
query would be scanned. Is that supported in Hudi? If not, is it possible to 
extend Hudi to support it? Hello Team - we have a need for bucketing our 
datasets (primarily to keep the parquet file size optimized for faster read). 
We see that Hudi doesn't support bucketing now. Are there any plans to support 
bucketing in the future?
I have a requirement to compact datalake but need bucketing on top of 
compaction so that during query time, only the files relevant to the "id" in 
query would be scanned. Is that supported in Hudi? If not, is it possible to 
extend Hudi to support it? Following up on the email"Bucketing in Hudi", we 
would like to schedule a meeting to understand and estimate the code changes 
needed to achieve bucketing in Hudi. The high level requirements are as 
detailed in email but we could chat further in the
meeting to get into specifics. When would be the earliest we could have this 
discussion?
```

> Investigate support for bucketed tables ala Hive #74
> ----------------------------------------------------
>
>                 Key: HUDI-55
>                 URL: https://issues.apache.org/jira/browse/HUDI-55
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Hive Integration
>            Reporter: Vinoth Chandar
>            Priority: Major
>
> https://github.com/uber/hudi/issues/74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to