JB-data opened a new issue #3905: URL: https://github.com/apache/hudi/issues/3905
Dear experts, we are trying to read data from a kafka topic and write it with hudi applying a transformation. Note our hudi command has worked before without transformation. Also we are working on a cluster that is configured very well and overall working fine. **Command used" the relevant part of the spark-submit command is: ``` spark-submit --jars abfs://somelocation/hudi-utilities-bundle_2.11-0.6.0.jar \ --props /somepath/mypropfile.properties \ --target-base-path abfs://somepath/somedb/unmanaged/sometable \ --table-type COPY_ON_WRITE \ --transformer-class org.apache.hudyi.utilities.transform.SqlQueryBasedTransformer \ --hoodie-conf hoodie.deltastreamer.transformer.sql='select `fielda,fieldb from <SRC>'; ``` **Stacktrace** We keep on getting following error complaining about the table provided in target-base-path: ``` ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path abfs://somepath/somedb/unmanaged/sometable/.hoodie' ``` We have defined an external table "sometable" for the table to land in the location above (abfs://somepath/somedb/unmanaged/sometable/), and it is empty so that hudi can write the data there. **Questions I have** 1)in the "hoodie.deltastreamer.transformer.sql" statement, the SELECT ... FROM .. statement-> what table do we choose here? do we really write <SRC> in the code? Does it know then from the props file that you have provided that he needs to fetch it from a kafka topic? 2)is our approach correct: we have a working kafka topic, and an empty external table, with the location that we provide in target-base-path hudi should read the data from the kafka topic and then write to the location of the external table? **Environment Description** * Hudi version : 0.6.0 * Spark version :2.4.5 * Hive version :Hive 3.1 on Tez * Hadoop version :3.1.1 * Storage (HDFS/S3/GCS..) : abfs (azure blob storage) * Running on Docker? (yes/no) :no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
