Thank you all for chiming in! Will raise an RFC.

On Fri, Aug 1, 2025 at 5:01 AM Yue Zhang <zhangyue921...@163.com> wrote:

> +1000. It is also noteworthy that while correcting historical parameters,
> we must establish a mechanism (likely Checkstyle?) to constrain inevitable
> future modifications to parameters. Looking forward a more detailed
> discussion in the RFC.
>
>
>
>
> Best,
>
> zhangyue19921010
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2025-08-01 08:36:52, "Danny Chan" <danny0...@apache.org> wrote:
> >+1, this is history technical debt, we should fix it as our
> >notions/terminologies are prone to be stable nowadays.
> >
> >Best,
> >Danny
> >
> >Bhavani Sudha <bhavanisud...@gmail.com> 于2025年7月31日周四 18:13写道:
> >>
> >> +1 on the idea Shiyan. Love to see an RFC as a next step.
> >>
> >> Thanks,
> >> Sudha
> >>
> >> On Thu, Jul 31, 2025 at 1:37 AM Geser Dugarov <geserduga...@gmail.com>
> >> wrote:
> >>
> >> > Hi Shiyan!
> >> >
> >> > I totally support this proposal and I'm happy to help if needed.
> >> >
> >> > I just want to highlight the scope of this work - currently, we have
> 989
> >> > configuration parameters. I had analyzed this earlier and have
> updated the
> >> > list after receiving your message. You can check it here:
> >> >
> >> >
> https://docs.google.com/spreadsheets/d/1a6BZbL5EmuTbftA2dShvSa0WeSOVNV2u/edit?usp=sharing&ouid=117459384969247807552&rtpof=true&sd=true
> >> >
> >> > It might be a good idea to prepare a corresponding RFC to define
> naming
> >> > standards for configuration parameters. It’s also crucial that the RFC
> >> > includes a clear plan for the steps of renaming, deprecation, and
> dropping
> >> > aliases across different groups of parameters — there are simply too
> many
> >> > to manage without a structured approach.
> >> >
> >> >   Best regards,
> >> >   Geser
> >> >
> >> >
> >> > On Thu, Jul 31, 2025 at 8:14 AM Shiyan Xu <
> xu.shiyan.raym...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > Since config names are the first thing users see when working with
> Hudi
> >> > and
> >> > > directly impact user and dev experience, we should pay careful
> attention
> >> > to
> >> > > keeping them standardized and easy to remember and use. I wanted to
> start
> >> > > this thread to raise some points so we can establish a set of
> standards
> >> > and
> >> > > create a migration path.
> >> > >
> >> > > 1. Plural vs Singular
> >> > >
> >> > > If a config supports taking multiple values, it has to be plural if
> >> > > applicable. For e.g., since Hudi 1.1, we support multiple ordering
> >> > fields,
> >> > > we should make `hoodie.datasource.write.precombine.field` plural. To
> >> > show a
> >> > > little bit seriousness, treat this kind of misleading config name
> >> > (singular
> >> > > but supports multiple values) as a bug.
> >> > >
> >> > > 2. Namespaces
> >> > >
> >> > > Always start with `hoodie.<function area>.` as the namespace to
> denote
> >> > the
> >> > > area of the config would serve. For e.g., `hoodie.table.*` is
> always a
> >> > > table config, `hoodie.write.*` is meant for writer to set,
> >> > `hoodie.read.*`
> >> > > is meant for query engines to use,
> >> > > `hoodie.<compaction|clustering|cleaning|indexing>.*` always denotes
> table
> >> > > service specific configs, `hoodie.<storage>.*` indicates configs
> that
> >> > > control storage layer settings, `hoodie.table.metadata.*` is
> specific for
> >> > > the metadata table.
> >> > >
> >> > > Keep these namespaces a fixed set of constants (a mandatory enum for
> >> > > composing config names), and do not causally change the words, like
> >> > > `compaction` vs `compact`, `cleaning` vs `clean`
> >> > >
> >> > > 3. snake_case
> >> > >
> >> > > Use `.` to delimit functionally distinct words and `_` (snake_case)
> to
> >> > > connect a meaningful phrase. For example:
> >> > >
> >> > > - `hoodie.table.recordkey.fields` should be
> >> > > `hoodie.table.record_key.fields`, as `recordkey` is not one word and
> >> > should
> >> > > follow snake_case.
> >> > > - `hoodie.table.keygenerator.class` should be
> >> > > `hoodie.table.key_generator.class`, for similar reason
> >> > > - `hoodie.table.index.defs.path` should be
> >> > `hoodie.table.index_defs.path`,
> >> > > "index defs" putting together is meant for one thing, but reading
> them
> >> > > separately as "index" and "defs" do not convey meaningful info
> about this
> >> > > config
> >> > > - `hoodie.file.group.reader.enabled` should be
> >> > > `hoodie.file_group.reader.enabled`, for similar reason
> >> > >
> >> > > 4. `hoodie.properties` only for catalog/table configs
> >> > >
> >> > > Only keep catalog/table configs in `hoodie.properties`; keep
> configs like
> >> > > `hoodie.datasource.write.*` out of it, add new table configs for
> those do
> >> > > not have a table config alias. For e.g., remove
> >> > > `hoodie.datasource.write.hive_style_partitioning` and put
> >> > > `hoodie.table.hive_style_partitioning` instead.
> >> > >
> >> > > 5. Improve naming case by case
> >> > >
> >> > > Some examples to consider:
> >> > > - All `hoodie.datasource.write.*` move to `hoodie.write.*`, keep
> things
> >> > > shorter
> >> > > - All feature-switching configs end with `enabled`, not to mix with
> >> > > `enable`
> >> > > - All meta/hive-sync related configs move to
> `hoodie.catalog.sync.*`,
> >> > > clearly stating it's working with catalogs, and the function is
> about
> >> > > "sync"
> >> > >
> >> > > 6. Standardize shorthand property names in SQL TBLPROPERTIES
> >> > >
> >> > > Everyone's first example of running Hudi has contained something
> like
> >> > this
> >> > >
> >> > > TBLPROPERTIES (
> >> > >   primaryKey = 'id',
> >> > >   preCombineField = 'ts'
> >> > > );
> >> > >
> >> > > Let's fix it:
> >> > >
> >> > > - "record key" is the term in Hudi so we don't want people to
> remember
> >> > > "primary key is meant for record key", and make sure the plural rule
> >> > > applies
> >> > > - "ordering field" is the newer term so let's deprecate the term
> >> > > "pre-combine field", and make sure the plural rule applies too
> >> > > - again, snake_case all the way so it should be like below (omit the
> >> > > `hoodie.table.` namespace) so people can associate them with the
> full
> >> > name
> >> > > easily:
> >> > >
> >> > > TBLPROPERTIES (
> >> > >   record_key.fields = 'id',
> >> > >   ordering.fields = 'ts'
> >> > > );
> >> > >
> >> > > - in cases where non-table configs need to be put in
> TBLPROPERTIES() , we
> >> > > can just omit `hoodie.` since we have `USING HUDI` in the SQL, so it
> >> > should
> >> > > support `read.*`, `write.*`, `storage.*` sort of shorthand keys
> >> > >
> >> > > 7. Address discrepancies between Flink options and Spark options
> >> > >
> >> > > A one-time sweep of flink configs that diverge from Spark configs,
> and
> >> > > align them according to the standards we're making. The goals are:
> >> > >
> >> > > - All `hoodie.*` configs should be engine-agnostic and universally
> >> > accepted
> >> > > by all engines when applicable
> >> > > - Any engine-specific config should be owned by the engine, and
> starts
> >> > with
> >> > > `hudi.` (like how the Trino Hudi connector does now)
> >> > >
> >> > >
> >> > > About migration: we should start adding new config names while
> keeping
> >> > the
> >> > > old ones compatible as aliases. That means, throughout the codebase,
> >> > config
> >> > > variables will contain the standard strings as the names, and any
> >> > > user-provided config will be translated to its new name if
> applicable.
> >> > >
> >> > > We don't really want to fail writers/readers just because of old
> config
> >> > > names so we can keep the aliases for quite some time, but there has
> to be
> >> > > deprecation warnings from now, and drop aliases at some major
> release
> >> > (like
> >> > > 2.0 or 3.0). But before that, any table version upgrade should
> strive to
> >> > > rename the configs in `hoodie.properties` as per the standards to
> >> > > evangelize the new names.
> >> > >
> >> > > Best,
> >> > > Shiyan
> >> > >
> >> >
>


-- 
Best,
Shiyan

Reply via email to