JB-data opened a new issue #3905:
URL: https://github.com/apache/hudi/issues/3905


   Dear experts, 
   we are trying to read data from a kafka topic and write it with hudi 
applying a transformation. Note our hudi command has worked before without 
transformation. Also we are working on a cluster that is configured very well 
and overall working fine.
   
   **Command used"
   the relevant part of the spark-submit command is:
   ```
   spark-submit
   --jars abfs://somelocation/hudi-utilities-bundle_2.11-0.6.0.jar \
   --props /somepath/mypropfile.properties \
   --target-base-path abfs://somepath/somedb/unmanaged/sometable \
   --table-type COPY_ON_WRITE \
   --transformer-class 
org.apache.hudyi.utilities.transform.SqlQueryBasedTransformer \
   --hoodie-conf hoodie.deltastreamer.transformer.sql='select `fielda,fieldb 
from <SRC>';
   ```
   
   **Stacktrace**
   We keep on getting following error complaining about the table provided in 
target-base-path:
   
   ```
   ERROR yarn.Client: Application diagnostics message: User class threw 
exception: org.apache.hudi.exception.TableNotFoundException: Hoodie table not 
found in path abfs://somepath/somedb/unmanaged/sometable/.hoodie'
   ```
   We have defined an external table "sometable" for the table to land in the 
location above (abfs://somepath/somedb/unmanaged/sometable/), and it is empty 
so that hudi can write the data there.
   
   **Questions I have**
   1)in the "hoodie.deltastreamer.transformer.sql" statement, the SELECT ... 
FROM .. statement-> what table do we choose here? do we really write <SRC> in 
the code? Does it know then from the props file that you have provided that he 
needs to fetch it from a kafka topic?
   2)is our approach correct: we have a working kafka topic, and an empty 
external table, with the location that we provide in target-base-path hudi 
should read the data from the kafka topic and then write to the location of the 
external table?
   
   
   
   **Environment Description**
   
   * Hudi version : 0.6.0
   
   * Spark version :2.4.5
   
   * Hive version :Hive 3.1 on Tez
   
   * Hadoop version :3.1.1
   
   * Storage (HDFS/S3/GCS..) : abfs (azure blob storage)
   
   * Running on Docker? (yes/no) :no
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to