[
https://issues.apache.org/jira/browse/HUDI-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Udit Mehrotra updated HUDI-3625:
--------------------------------
Fix Version/s: 0.12.0
> [Umbrella] Optimized storage layout for cloud object stores
> -----------------------------------------------------------
>
> Key: HUDI-3625
> URL: https://issues.apache.org/jira/browse/HUDI-3625
> Project: Apache Hudi
> Issue Type: Epic
> Components: core
> Reporter: Udit Mehrotra
> Assignee: Udit Mehrotra
> Priority: Major
> Labels: hudi-umbrellas
> Fix For: 0.12.0
>
>
> Amazon S3 among other cloud object stores, throttle requests based on object
> prefix =>
> [https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/].
> Hudi follows the traditional Hive storage layout, with files being stored
> under separate partition paths under a common table path/prefix. This
> introduces the potential for throttling because of request limits being
> reached for the common table path/prefix, when writing significant number of
> files concurrently.
> We propose implementing an alternate storage layout, that would be more
> suitable for cloud object stores like S3 to avoid running into throttling
> issues as the data scales. At a high level, we need to be able to distribute
> data files evenly across randomly generated prefixes, so that request limits
> get distributed across those prefixes, instead of a single table prefix.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)