Vinoth Govindarajan created HUDI-2681:
-----------------------------------------
Summary: Make hoodie record_key and preCombine_key optional
Key: HUDI-2681
URL: https://issues.apache.org/jira/browse/HUDI-2681
Project: Apache Hudi
Issue Type: New Feature
Components: Common Core
Reporter: Vinoth Govindarajan
At present, Hudi needs an record key and preCombine key to create an Hudi
datasets, which puts an restriction on the kinds of datasets we can create
using Hudi.
In order to increase the adoption of Hudi file format across all kinds of
derived datasets, similar to Parquet/ORC, we need to offer flexibility to
users. I understand that record key is used for upsert primitive and we need
preCombine key to break the tie and deduplicate, but there are event data and
other datasets without any primary key (append only datasets), which can
benefit from Hudi since Hudi ecosystem offers other features such as snapshot
isolation, indexes, clustering, delta streamer etc., which could be applied to
any datasets without record key.
The idea of this proposal is to make both the record key and preCombine key
optional to allow variety of new use cases on top of Hudi.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)