Nishith Agarwal created HUDI-1449:
-------------------------------------

             Summary: Support for _hoodie_record_key as a virtual column 
                 Key: HUDI-1449
                 URL: https://issues.apache.org/jira/browse/HUDI-1449
             Project: Apache Hudi
          Issue Type: Improvement
          Components: Common Core
            Reporter: Nishith Agarwal
            Assignee: Abhishek Modi


Context:
Currently, _hoodie_record_key is written to DFS, as a column in the Parquet
file. In our production systems at Uber however, _hoodie_record_key
contains data that can be found in a different column (or set of columns).
This means that we are storing duplicated data.

Proposal:
In the interest of improving storage efficiency, we could add confs /
abstract classes that can construct the _hoodie_record_key given other
columns. That way we do not have to store duplicated data on DFS.

 

RFC -> 
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to