[
https://issues.apache.org/jira/browse/HUDI-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinish Reddy updated HUDI-6627:
-------------------------------
Description:
When source returns an empty option in deltastreamer, the writer schema is
null. This causes an NPE with the table schema validation in spark write client
causing the below exception. We should skip this validation when writer schema
is null.
{code:java}
org.apache.hudi.exception.HoodieInsertException: Failed insert schema
compability check.
at
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:851)
at
org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:185)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:690)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:396)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.ingestOnce(HoodieDeltaStreamer.java:876)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at
com.onehouse.hudi.OnehouseDeltaStreamer$MultiTableSyncService.lambda$null$1(OnehouseDeltaStreamer.java:319)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieException: Failed to read
schema/check compatibility for base path
s3a://onehouse-customer-bucket-2451e78f/data-lake/chandra_data_lake_default/xml_flatten_struct_test
at
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:830)
at
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:849)
... 10 more
Caused by: java.lang.NullPointerException
at
com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1158)
at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
at
org.apache.hudi.avro.HoodieAvroUtils.createHoodieWriteSchema(HoodieAvroUtils.java:302)
at
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:826)
... 11 more
{code}
was:
When source returns an empty option in deltastreamer, the writer schema is
null. This causes an NPE with the table schema validation in spark write client
causing the below exception. We should skip this validation when writer schema
is null.
{quote}org.apache.hudi.exception.HoodieInsertException: Failed insert schema
compability check.
at
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:851)
at
org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:185)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:690)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:396)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.ingestOnce(HoodieDeltaStreamer.java:876)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at
com.onehouse.hudi.OnehouseDeltaStreamer$MultiTableSyncService.lambda$null$1(OnehouseDeltaStreamer.java:319)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieException: Failed to read
schema/check compatibility for base path
s3a://onehouse-customer-bucket-2451e78f/data-lake/chandra_data_lake_default/xml_flatten_struct_test
at
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:830)
at
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:849)
... 10 more
Caused by: java.lang.NullPointerException
at
com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1158)
at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
at
org.apache.hudi.avro.HoodieAvroUtils.createHoodieWriteSchema(HoodieAvroUtils.java:302)
at
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:826)
... 11 more{quote}
> Spark write client fails when write schema is null
> --------------------------------------------------
>
> Key: HUDI-6627
> URL: https://issues.apache.org/jira/browse/HUDI-6627
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Vinish Reddy
> Priority: Minor
>
> When source returns an empty option in deltastreamer, the writer schema is
> null. This causes an NPE with the table schema validation in spark write
> client causing the below exception. We should skip this validation when
> writer schema is null.
> {code:java}
> org.apache.hudi.exception.HoodieInsertException: Failed insert schema
> compability check.
> at
> org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:851)
> at
> org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:185)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:690)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:396)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.ingestOnce(HoodieDeltaStreamer.java:876)
> at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
> at
> com.onehouse.hudi.OnehouseDeltaStreamer$MultiTableSyncService.lambda$null$1(OnehouseDeltaStreamer.java:319)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.hudi.exception.HoodieException: Failed to read
> schema/check compatibility for base path
> s3a://onehouse-customer-bucket-2451e78f/data-lake/chandra_data_lake_default/xml_flatten_struct_test
> at
> org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:830)
> at
> org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:849)
> ... 10 more
> Caused by: java.lang.NullPointerException
> at
> com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1158)
> at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
> at
> org.apache.hudi.avro.HoodieAvroUtils.createHoodieWriteSchema(HoodieAvroUtils.java:302)
> at
> org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:826)
> ... 11 more
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)