TamilselvanBalaiah opened a new issue #3423:
URL: https://github.com/apache/hudi/issues/3423


   HI,
   
   I am getting "Table or view not found error", when I am using below 
transformer sql in hoodie preperties file.
   " hoodie.deltastreamer.transformer.sql=SELECT 
a.CLM_HDR_ADMISSION_DETAIL_SID, a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD, 
a.ADMISSION_SOURCE_LKPCD, a.PATIENT_STATUS_LKPCD, a.CREATED_BY, 
to_date(a.CREATED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as CREATED_DATE, 
a.MODIFIED_BY, to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as 
MODIFIED_DATE, b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year, 
string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a, 
default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid"
   
   Here "default.ad_claim_header_ro" table were successfully loaded into S3 
partition buckets(hudi datasets). Since the table was loaded into the target, I 
am trying to get the column from the table with using inner join.
   
   While reading the data from Ad_Claim_Header table, am getting the "Table or 
view not found" error. But the tables are already exists in the particular 
database(default) and S3 path.
   
   I can able to query the Ad_Claim_Header table in HIVE and Spark sql without 
any issues. Only the problem in Apache Hudi. Is that anywhere, do we need to do 
any of the configuration for reading the existing tables while doing the 
processing of another datasets.
   
   Can anyone please help me on this.
   
   spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
     --packages 
org.apache.hudi:hudi-utilities-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.5
 \
     --master yarn --deploy-mode cluster \
     --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
     --conf spark.sql.hive.convertMetastoreParquet=false \
     
/usr/lib/hudi/org.apache.hudi_hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
     --table-type MERGE_ON_READ \
     --op BULK_INSERT \
     --source-ordering-field CLM_HDR_ADMISSION_DETAIL_SID \
     --props 
s3://aws-glue-udp-e2e-bkt-raw/properties/ad-clm-hdr-admission-detail.properties 
\
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
     --target-base-path 
s3://aws-glue-udp-e2e-bkt-dtlke-raw/adj-claim/PRDMMIS/ad_clm_hdr_admission_detail
 --target-table default.ad_clm_hdr_admission_detail \
     --transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
     --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
     --enable-hive-sync
     
   Properties File
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.datasource.write.partitionpath.field=year,month
   hoodie.deltastreamer.transformer.sql=SELECT a.CLM_HDR_ADMISSION_DETAIL_SID, 
a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD, a.ADMISSION_SOURCE_LKPCD, 
a.PATIENT_STATUS_LKPCD, a.CREATED_BY, to_date(a.CREATED_DATE,'DD-MON-YYYY 
HH24:MI:SS AM') as CREATED_DATE, a.MODIFIED_BY, 
to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as MODIFIED_DATE, 
b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year, 
string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a, 
default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid
   hoodie.datasource.write.recordkey.field=CLM_HDR_ADMISSION_DETAIL_SID
   hoodie.datasource.write.hive_style_partitioning=true
   #hive sync settings, uncomment if using flag --enable-hive-sync
   hoodie.datasource.hive_sync.partition_fields=year,month
   
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
   hoodie.datasource.hive_sync.table=ad_clm_hdr_admission_detail
   # DFS Source
   
hoodie.deltastreamer.source.dfs.root=s3://aws-glue-udp-e2e-bkt-raw/adj-claim/PRDMMIS/AD_CLM_HDR_ADMISSION_DETAIL
   **Environment Description**
   * EMR version : 5.33
   
   * Hudi version : hudi-utilities-bundle_2.11-0.5.2-incubating.jar
   
   * Spark version : 2.4.7
   
   * Hive version : 2.3.7
   
   * Hadoop version : 2.10.1
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) : No
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to