[
https://issues.apache.org/jira/browse/HUDI-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Scheller updated HUDI-1146:
-----------------------------------
Description:
DeltaStreamer issue — happens with both COW or MOR - Restarting the
DeltaStreamer Process crashes, that is, 2nd Run does nothing.
Steps:
Run Hudi DeltaStreamer job in --continuous mode
Run the same job again without deleting the output parquet files generated due
to step above
2nd run crashes with the below error ( it does not crash if we delete the
output parquet file)
{{Caused by: org.apache.hudi.exception.HoodieException: Please provide a valid
schema provider class!}}
{{ at
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:53)}}
{{ at
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:312)}}
{{ at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)}}
{{ at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:392)}}
{{This looks to be because of this line:}}
{{[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L315]
}}
The "orElse" block here doesn't seem to make sense as if "transformed" is empty
then it is likely "dataAndCheckpoint" will have a null schema provider
was:
DeltaStreamer issue — happens with both COW or MOR - Restarting the
DeltaStreamer Process crashes, that is, 2nd Run does nothing.
Steps:
Run Hudi DeltaStreamer job in --continuous mode
Run the same job again without deleting the output parquet files generated due
to step above
2nd run crashes with the below error ( it does not crash if we delete the
output parquet file)
{{Caused by: org.apache.hudi.exception.HoodieException: Please provide a valid
schema provider class!
at
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:53)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:312)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:392)}}
> DeltaStreamer fails to start when No updated records + schemaProvider not
> supplied
> ----------------------------------------------------------------------------------
>
> Key: HUDI-1146
> URL: https://issues.apache.org/jira/browse/HUDI-1146
> Project: Apache Hudi
> Issue Type: Bug
> Components: Hive Integration
> Reporter: Brandon Scheller
> Priority: Major
>
> DeltaStreamer issue — happens with both COW or MOR - Restarting the
> DeltaStreamer Process crashes, that is, 2nd Run does nothing.
> Steps:
> Run Hudi DeltaStreamer job in --continuous mode
> Run the same job again without deleting the output parquet files generated
> due to step above
> 2nd run crashes with the below error ( it does not crash if we delete the
> output parquet file)
> {{Caused by: org.apache.hudi.exception.HoodieException: Please provide a
> valid schema provider class!}}
> {{ at
> org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:53)}}
> {{ at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:312)}}
> {{ at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)}}
> {{ at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:392)}}
>
> {{This looks to be because of this line:}}
> {{[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L315]
> }}
> The "orElse" block here doesn't seem to make sense as if "transformed" is
> empty then it is likely "dataAndCheckpoint" will have a null schema provider
--
This message was sent by Atlassian Jira
(v8.3.4#803005)