+1 on the idea Shiyan. Love to see an RFC as a next step.

Thanks,
Sudha

On Thu, Jul 31, 2025 at 1:37 AM Geser Dugarov <geserduga...@gmail.com>
wrote:

> Hi Shiyan!
>
> I totally support this proposal and I'm happy to help if needed.
>
> I just want to highlight the scope of this work - currently, we have 989
> configuration parameters. I had analyzed this earlier and have updated the
> list after receiving your message. You can check it here:
>
> https://docs.google.com/spreadsheets/d/1a6BZbL5EmuTbftA2dShvSa0WeSOVNV2u/edit?usp=sharing&ouid=117459384969247807552&rtpof=true&sd=true
>
> It might be a good idea to prepare a corresponding RFC to define naming
> standards for configuration parameters. It’s also crucial that the RFC
> includes a clear plan for the steps of renaming, deprecation, and dropping
> aliases across different groups of parameters — there are simply too many
> to manage without a structured approach.
>
>   Best regards,
>   Geser
>
>
> On Thu, Jul 31, 2025 at 8:14 AM Shiyan Xu <xu.shiyan.raym...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Since config names are the first thing users see when working with Hudi
> and
> > directly impact user and dev experience, we should pay careful attention
> to
> > keeping them standardized and easy to remember and use. I wanted to start
> > this thread to raise some points so we can establish a set of standards
> and
> > create a migration path.
> >
> > 1. Plural vs Singular
> >
> > If a config supports taking multiple values, it has to be plural if
> > applicable. For e.g., since Hudi 1.1, we support multiple ordering
> fields,
> > we should make `hoodie.datasource.write.precombine.field` plural. To
> show a
> > little bit seriousness, treat this kind of misleading config name
> (singular
> > but supports multiple values) as a bug.
> >
> > 2. Namespaces
> >
> > Always start with `hoodie.<function area>.` as the namespace to denote
> the
> > area of the config would serve. For e.g., `hoodie.table.*` is always a
> > table config, `hoodie.write.*` is meant for writer to set,
> `hoodie.read.*`
> > is meant for query engines to use,
> > `hoodie.<compaction|clustering|cleaning|indexing>.*` always denotes table
> > service specific configs, `hoodie.<storage>.*` indicates configs that
> > control storage layer settings, `hoodie.table.metadata.*` is specific for
> > the metadata table.
> >
> > Keep these namespaces a fixed set of constants (a mandatory enum for
> > composing config names), and do not causally change the words, like
> > `compaction` vs `compact`, `cleaning` vs `clean`
> >
> > 3. snake_case
> >
> > Use `.` to delimit functionally distinct words and `_` (snake_case) to
> > connect a meaningful phrase. For example:
> >
> > - `hoodie.table.recordkey.fields` should be
> > `hoodie.table.record_key.fields`, as `recordkey` is not one word and
> should
> > follow snake_case.
> > - `hoodie.table.keygenerator.class` should be
> > `hoodie.table.key_generator.class`, for similar reason
> > - `hoodie.table.index.defs.path` should be
> `hoodie.table.index_defs.path`,
> > "index defs" putting together is meant for one thing, but reading them
> > separately as "index" and "defs" do not convey meaningful info about this
> > config
> > - `hoodie.file.group.reader.enabled` should be
> > `hoodie.file_group.reader.enabled`, for similar reason
> >
> > 4. `hoodie.properties` only for catalog/table configs
> >
> > Only keep catalog/table configs in `hoodie.properties`; keep configs like
> > `hoodie.datasource.write.*` out of it, add new table configs for those do
> > not have a table config alias. For e.g., remove
> > `hoodie.datasource.write.hive_style_partitioning` and put
> > `hoodie.table.hive_style_partitioning` instead.
> >
> > 5. Improve naming case by case
> >
> > Some examples to consider:
> > - All `hoodie.datasource.write.*` move to `hoodie.write.*`, keep things
> > shorter
> > - All feature-switching configs end with `enabled`, not to mix with
> > `enable`
> > - All meta/hive-sync related configs move to `hoodie.catalog.sync.*`,
> > clearly stating it's working with catalogs, and the function is about
> > "sync"
> >
> > 6. Standardize shorthand property names in SQL TBLPROPERTIES
> >
> > Everyone's first example of running Hudi has contained something like
> this
> >
> > TBLPROPERTIES (
> >   primaryKey = 'id',
> >   preCombineField = 'ts'
> > );
> >
> > Let's fix it:
> >
> > - "record key" is the term in Hudi so we don't want people to remember
> > "primary key is meant for record key", and make sure the plural rule
> > applies
> > - "ordering field" is the newer term so let's deprecate the term
> > "pre-combine field", and make sure the plural rule applies too
> > - again, snake_case all the way so it should be like below (omit the
> > `hoodie.table.` namespace) so people can associate them with the full
> name
> > easily:
> >
> > TBLPROPERTIES (
> >   record_key.fields = 'id',
> >   ordering.fields = 'ts'
> > );
> >
> > - in cases where non-table configs need to be put in TBLPROPERTIES() , we
> > can just omit `hoodie.` since we have `USING HUDI` in the SQL, so it
> should
> > support `read.*`, `write.*`, `storage.*` sort of shorthand keys
> >
> > 7. Address discrepancies between Flink options and Spark options
> >
> > A one-time sweep of flink configs that diverge from Spark configs, and
> > align them according to the standards we're making. The goals are:
> >
> > - All `hoodie.*` configs should be engine-agnostic and universally
> accepted
> > by all engines when applicable
> > - Any engine-specific config should be owned by the engine, and starts
> with
> > `hudi.` (like how the Trino Hudi connector does now)
> >
> >
> > About migration: we should start adding new config names while keeping
> the
> > old ones compatible as aliases. That means, throughout the codebase,
> config
> > variables will contain the standard strings as the names, and any
> > user-provided config will be translated to its new name if applicable.
> >
> > We don't really want to fail writers/readers just because of old config
> > names so we can keep the aliases for quite some time, but there has to be
> > deprecation warnings from now, and drop aliases at some major release
> (like
> > 2.0 or 3.0). But before that, any table version upgrade should strive to
> > rename the configs in `hoodie.properties` as per the standards to
> > evangelize the new names.
> >
> > Best,
> > Shiyan
> >
>

Reply via email to