Nishith Agarwal created HUDI-1449:
-------------------------------------
Summary: Support for _hoodie_record_key as a virtual column
Key: HUDI-1449
URL: https://issues.apache.org/jira/browse/HUDI-1449
Project: Apache Hudi
Issue Type: Improvement
Components: Common Core
Reporter: Nishith Agarwal
Assignee: Abhishek Modi
Context:
Currently, _hoodie_record_key is written to DFS, as a column in the Parquet
file. In our production systems at Uber however, _hoodie_record_key
contains data that can be found in a different column (or set of columns).
This means that we are storing duplicated data.
Proposal:
In the interest of improving storage efficiency, we could add confs /
abstract classes that can construct the _hoodie_record_key given other
columns. That way we do not have to store duplicated data on DFS.
RFC ->
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual
--
This message was sent by Atlassian Jira
(v8.3.4#803005)