[
https://issues.apache.org/jira/browse/HUDI-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar resolved HUDI-1449.
----------------------------------
> Support for _hoodie_record_key as a virtual column
> ---------------------------------------------------
>
> Key: HUDI-1449
> URL: https://issues.apache.org/jira/browse/HUDI-1449
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Common Core
> Reporter: Nishith Agarwal
> Assignee: Abhishek Modi
> Priority: Major
>
> Context:
> Currently, _hoodie_record_key is written to DFS, as a column in the Parquet
> file. In our production systems at Uber however, _hoodie_record_key
> contains data that can be found in a different column (or set of columns).
> This means that we are storing duplicated data.
> Proposal:
> In the interest of improving storage efficiency, we could add confs /
> abstract classes that can construct the _hoodie_record_key given other
> columns. That way we do not have to store duplicated data on DFS.
>
> RFC ->
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual
--
This message was sent by Atlassian Jira
(v8.20.1#820001)