clp007 opened a new issue, #7960:
URL: https://github.com/apache/hudi/issues/7960
**Describe the problem you faced**
There is a problem when synchronizing the hudi table to bigquery. I'm not
sure what the problem is and how to solve it;
spark-submit --master yarn \
--packages com.google.cloud:google-cloud-bigquery:2.10.4 \
--jars /opt/hudi-gcp-bundle-0.12.1.jar \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
/opt/hudi-utilities-bundle_2.12-0.12.1.jar \
--target-base-path
gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \
--target-table bubble-pop-b01a0 \
--table-type COPY_ON_WRITE \
--base-file-format PARQUET \
--enable-sync \
--sync-tool-classes org.apache.hudi.gcp.bigquery.BigQuerySyncTool \
--hoodie-conf
hoodie.deltastreamer.source.dfs.root=gs://transfer-table-data/incremental/test/bubble-pop-b01a0
\
--hoodie-conf hoodie.gcp.bigquery.sync.project_id=transferred \
--hoodie-conf hoodie.gcp.bigquery.sync.dataset_name=temp_data \
--hoodie-conf hoodie.gcp.bigquery.sync.dataset_location=us-central1 \
--hoodie-conf hoodie.gcp.bigquery.sync.table_name=temp_bubble-pop \
--hoodie-conf
hoodie.gcp.bigquery.sync.base_path=gs://transfer-table-data/tmp/temp_bubble-pop/${NOW}
\
--hoodie-conf hoodie.gcp.bigquery.sync.partition_fields=event_date \
--hoodie-conf
hoodie.gcp.bigquery.sync.source_uri=gs://transfer-table-data/incremental/test/bubble-pop-b01a0/event_date=*
\
--hoodie-conf
hoodie.gcp.bigquery.sync.source_uri_prefix=gs://transfer-table-data/incremental/test/bubble-pop-b01a0
\
--hoodie-conf hoodie.gcp.bigquery.sync.use_file_listing_from_metadata=true \
--hoodie-conf hoodie.gcp.bigquery.sync.assume_date_partitioning=false \
--hoodie-conf
hoodie.datasource.write.recordkey.field=event_timestamp,event_name,user_pseudo_id,user_first_touch_timestamp,advertising_id
\
--hoodie-conf hoodie.datasource.write.partitionpath.field=event_date \
--hoodie-conf hoodie.datasource.write.precombine.field=event_timestamp \
--hoodie-conf hoodie.datasource.write.keygenerator.type=COMPLEX \
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \
--hoodie-conf hoodie.datasource.write.drop.partition.columns=true \
--hoodie-conf hoodie.partition.metafile.use.base.format=true \
--hoodie-conf hoodie.metadata.enable=true \
**To Reproduce**
Steps to reproduce the behavior:
1. An error occurred when I ran the above script
**Environment Description**
* Hudi version : hudi-spark3.2-bundle_2.12:0.12.1
* Spark version :3.1
* Storage (HDFS/S3/GCS..) :GCS
* Running on Docker? (yes/no) :no
**Additional context**
dataproc spark
**Stacktrace**
```Add the stacktrace of the error.```
ERROR org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Got error
ru
nning delta sync once. Shutting down
org.apache.hudi.exception.HoodieException: Please provide a valid schema
provider class!
at
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56)
at
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S
ourceFormatAdapter.java:64)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS
treamer.java:204)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j
ava:202)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j
ava:571)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark
Submit.scala:951)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/02/15 08:15:04 INFO org.apache.hudi.utilities.deltastreamer.DeltaSync:
Shutting down embedded
timeline server
23/02/15 08:15:04 INFO
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Shut down del
ta streamer
23/02/15 08:15:04 INFO org.sparkproject.jetty.server.AbstractConnector:
Stopped Spark@2b10ace9{HT
TP/1.1, (http/1.1)}{0.0.0.0:8090}
Exception in thread "main" org.apache.hudi.exception.HoodieException: Please
provide a valid sche
ma provider class!
at
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56)
at
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S
ourceFormatAdapter.java:64)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS
treamer.java:204)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j
ava:202)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j
ava:571)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark
Submit.scala:951)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]