[
https://issues.apache.org/jira/browse/HUDI-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vamshi Gudavarthi updated HUDI-5001:
------------------------------------
Description: This issue is within the scope of row sources. The actual
issue is that if the names of the columns in the row sources contain invalid
avro characters ref [here|https://avro.apache.org/docs/1.10.2/spec.html#names]
then using configuration set we can sanitize the column names both in the
schema and actual data and the data ingestion to hudi isn't failed. The schema
provider is scoped out to filebasedschemaregistry as other schema registries
might not allow to register invalid schema in the first place.
> Sanitize avro column names for RowSource
> ----------------------------------------
>
> Key: HUDI-5001
> URL: https://issues.apache.org/jira/browse/HUDI-5001
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Vamshi Gudavarthi
> Assignee: Vamshi Gudavarthi
> Priority: Major
>
> This issue is within the scope of row sources. The actual issue is that if
> the names of the columns in the row sources contain invalid avro characters
> ref [here|https://avro.apache.org/docs/1.10.2/spec.html#names] then using
> configuration set we can sanitize the column names both in the schema and
> actual data and the data ingestion to hudi isn't failed. The schema provider
> is scoped out to filebasedschemaregistry as other schema registries might not
> allow to register invalid schema in the first place.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)