BartPeetersEnjoyer opened a new issue #3906:
URL: https://github.com/apache/hudi/issues/3906
We have a working HoodieDeltaStreamer job that ingests data from a Kafka
queue into a Hive table. This table needs to be refactored, so the currently
working files have been copied and edited to do the same job but land the data
in a different table (different path). We have been running into the same error
for a few days and have run out of ideas of how to solve it.
Our spark-submit command:
`spark-submit --master yarn --deploy-mode cluster --driver-java-options
"-Djava.security.auth.login.config=./client.jaas" --conf
"spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./client.jaas"
--conf spark.app.name=somename --conf spark.yarn.maxAppAttempts=4 --conf
spark.yarn.am.attemptFailuresValidityInterval=1h --conf
spark.yarn.executor.failuresValidityInterval=1h --conf spark.task.maxFailures=8
--principal someuser@somedomain --keytab somekeytab.keytab --files
sometruststore.jks --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars
/opt/cloudera/parcels/CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hive/lib/hive-service-3.1.3000.7.2.2.3-1.jar,/opt/cloudera/parcels/CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hive/lib/hive-jdbc-3.1.3000.7.2.2.3-1.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,
abfs://somepath/hudi-utilities-bundle_2.11-0.6.0.jar --props
somefile.properties --source
-class org.apache.hudi.utilities.sources.AvroKafkaSource --target-table
sometable --target-base-path abfs://somepath/somedb/unmanaged/sometable
--table-type COPY_ON_WRITE --schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider --source-limit 40000`
The error we're encountering:
`21/11/02 16:07:50 ERROR yarn.Client: Application diagnostics message: User
class threw exception: org.apache.hudi.exception.TableNotFoundException: Hoodie
table not found in path abfs://somepath/somedb/unmanaged/sometable/.hoodie
at
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:49)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:127)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:113)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:524)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:129)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:99)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:464)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:665)`
We only changed these 2 parameters in our properties file, so this shouldn't
be the issue.
`hoodie.datasource.hive_sync.database=somedb`
`hoodie.datasource.hive_sync.table=sometable`
The rights to the target folder in Azure should be correct. Many thanks for
reading this post.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]