bhasudha commented on issue #1813: URL: https://github.com/apache/hudi/issues/1813#issuecomment-657295902
@jcunhafonte Could you try using the DeltaStreamer in continuous mode rather than using the scheduled job ? I think what's happening is the schema provider is [initialized for the very first time](https://github.com/apache/hudi/blob/20ac7c3337a14dd777f6ebe21b13dab2786f2479/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L229) if its null ( that why your first run is working fine). After that, the initialized state is lost since you are recreating the DeltaSync object by refreshing the command in a schedule job (instead of running it in continuous mode where it has access to the same DeltaSync object with initialized schema provider object). If you would rather want to use the scheduled job, then the tool expects you provide the schema provider explicitly via config. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
