Daniel Ford created HUDI-5973:
---------------------------------

             Summary: Add cachedSchema per write batch to fix idempotency with 
getSourceSchema calls
                 Key: HUDI-5973
                 URL: https://issues.apache.org/jira/browse/HUDI-5973
             Project: Apache Hudi
          Issue Type: Task
          Components: deltastreamer
            Reporter: Daniel Ford


The issue is. getSourceScheme in case of SchemaRegistry provider is not 
idempotent. even within a single batch of write, if we call getSourceSchema 
multiple times, it could return latest schema from the schema registry. ideally 
we want it to return one schema for one batch of write.
so, the fix is to add a new api to Source abstract class called "clearCaches" 
or "cleanupResources". also add similar apis to SchemaProvider. and so within 
source.clearCaches, we will call schemaProvider.clearCaches.
Incase of SchemaRegistryProvider, for every batch, we will fetch from remote 
schema registry and cache is locally. for subsequent calls to getsourceSchema, 
we will be returning the same value. before moving onto next batch of consume, 
we will have to call clearCaches which will invalidate the local cache of 
source schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to