BartPeetersEnjoyer opened a new issue #3906:
URL: https://github.com/apache/hudi/issues/3906


   We have a working HoodieDeltaStreamer job that ingests data from a Kafka 
queue into a Hive table. This table needs to be refactored, so the currently 
working files have been copied and edited to do the same job but land the data 
in a different table (different path). We have been running into the same error 
for a few days and have run out of ideas of how to solve it.
   
   Our spark-submit command:
   `spark-submit --master yarn --deploy-mode cluster --driver-java-options 
"-Djava.security.auth.login.config=./client.jaas" --conf 
"spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./client.jaas"
 --conf spark.app.name=somename --conf spark.yarn.maxAppAttempts=4 --conf 
spark.yarn.am.attemptFailuresValidityInterval=1h --conf 
spark.yarn.executor.failuresValidityInterval=1h --conf spark.task.maxFailures=8 
--principal someuser@somedomain --keytab somekeytab.keytab --files 
sometruststore.jks --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars 
/opt/cloudera/parcels/CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hive/lib/hive-service-3.1.3000.7.2.2.3-1.jar,/opt/cloudera/parcels/CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hive/lib/hive-jdbc-3.1.3000.7.2.2.3-1.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,
 abfs://somepath/hudi-utilities-bundle_2.11-0.6.0.jar --props 
somefile.properties --source
 -class org.apache.hudi.utilities.sources.AvroKafkaSource --target-table 
sometable --target-base-path abfs://somepath/somedb/unmanaged/sometable 
--table-type COPY_ON_WRITE --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider --source-limit 40000`
   
   The error we're encountering:
   `21/11/02 16:07:50 ERROR yarn.Client: Application diagnostics message: User 
class threw exception: org.apache.hudi.exception.TableNotFoundException: Hoodie 
table not found in path abfs://somepath/somedb/unmanaged/sometable/.hoodie
           at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:49)
           at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:127)
           at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:113)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:524)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:129)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:99)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:464)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:665)`
   
   We only changed these 2 parameters in our properties file, so this shouldn't 
be the issue.
   `hoodie.datasource.hive_sync.database=somedb`
   `hoodie.datasource.hive_sync.table=sometable`
   
   The rights to the target folder in Azure should be correct. Many thanks for 
reading this post.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to