[
https://issues.apache.org/jira/browse/HUDI-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pratyaksh Sharma updated HUDI-3264:
-----------------------------------
Status: Patch Available (was: In Progress)
> Make schema registry configs more flexible with MultiTableDeltaStreamer
> -----------------------------------------------------------------------
>
> Key: HUDI-3264
> URL: https://issues.apache.org/jira/browse/HUDI-3264
> Project: Apache Hudi
> Issue Type: Task
> Components: deltastreamer
> Reporter: sivabalan narayanan
> Assignee: Pratyaksh Sharma
> Priority: Major
> Labels: pull-request-available, sev:normal
> Fix For: 0.11.0
>
>
> Ref issue: [https://github.com/apache/hudi/issues/4585]
> Hi guys,
> we ran into a problem setting the target schema of our Hudi table using the
> MultiTableDeltaStreamer.
> Using a normal DeltaStreamer, we are able to set our source and target
> schemas using the properties:
> * hoodie.deltastreamer.schemaprovider.registry.url
> * hoodie.deltastreamer.schemaprovider.registry.targetUrl
> We found that we are not able to set these properties on a table basis using
> the MultiTableDeltaStreamer, since the MTDS builds SchemaRegistry URLs for
> target and source schema using the properties:
> * hoodie.deltastreamer.schemaprovider.registry.baseUrl
> * hoodie.deltastreamer.schemaprovider.registry.sourceUrlSuffix
> * hoodie.deltastreamer.schemaprovider.registry.targetUrlSuffix
> Later the MultiTableDeltaStreamer uses the source Kafka Topic name also for
> setting the name of the target schema:
>
> [hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java|https://github.com/apache/hudi/blob/9fe28e56b49c7bf68ae2d83bfe89755314aa793b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L167]
> Line 167 in
> [9fe28e5|https://github.com/apache/hudi/commit/9fe28e56b49c7bf68ae2d83bfe89755314aa793b]
> ||typedProperties.setProperty(Constants.TARGET_SCHEMA_REGISTRY_URL_PROP,
> schemaRegistryBaseUrl + typedProperties.getString(Constants.KAFKA_TOPIC_PROP)
> + targetSchemaRegistrySuffix);|
>
> We think, that schema names should be more configurable, like the origin
> DeltaStreamer would handle it. Actually the names of the schemas you want to
> use for reading or writing the data are very tight coupled to the name of the
> Kafka topic the data is loaded from.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)