[ 
https://issues.apache.org/jira/browse/HUDI-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101034#comment-17101034
 ] 

Yixue (Andrew) Zhu commented on HUDI-603:
-----------------------------------------

I just started working on this, and come up with a seemingly reasonable 
approach:

If some delta stream configuration enabled, when provider schema changed, 
restart the program to use the new schema.
We can use this option for a couple of reasons:
 # Spark serialization of Avro record and schema is optimized when schemas are 
registered before program is executed, i.e. executors are spawned by the driver.
If we refresh schema w/o recreating SparkConf, which is not supported by Spark 
without restating the program, the serialization optimization would be defeated.
 # It is not frequent for table schema to be updated.

By throwing exception in the DeltaSync::syncOnce(), the following Spark 
configuration would restart the program:
  --conf spark.yarn.max.maxAppAttempts
  --conf spark.yarn.am.attemptFailuresValidityInterval

> HoodieDeltaStreamer should periodically fetch table schema update
> -----------------------------------------------------------------
>
>                 Key: HUDI-603
>                 URL: https://issues.apache.org/jira/browse/HUDI-603
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: DeltaStreamer
>            Reporter: Yixue Zhu
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>              Labels: evolution, pull-request-available, schema
>
> HoodieDeltaStreamer create SchemaProvider instance and delegate to DeltaSync 
> for periodical sync. However, default implementation of SchemaProvider does 
> not refresh schema, which can change due to schema evolution. DeltaSync 
> snapshot the schema when it creates writeClient, using the SchemaProvider 
> instance or pick up from source, and the schema for writeClient is not 
> refreshed during the loop of Sync.
> I think this needs to be addressed to support schema evolution fully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to