TamilselvanBalaiah opened a new issue #3423:
URL: https://github.com/apache/hudi/issues/3423
HI,
I am getting "Table or view not found error", when I am using below
transformer sql in hoodie preperties file.
" hoodie.deltastreamer.transformer.sql=SELECT
a.CLM_HDR_ADMISSION_DETAIL_SID, a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD,
a.ADMISSION_SOURCE_LKPCD, a.PATIENT_STATUS_LKPCD, a.CREATED_BY,
to_date(a.CREATED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as CREATED_DATE,
a.MODIFIED_BY, to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as
MODIFIED_DATE, b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year,
string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a,
default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid"
Here "default.ad_claim_header_ro" table were successfully loaded into S3
partition buckets(hudi datasets). Since the table was loaded into the target, I
am trying to get the column from the table with using inner join.
While reading the data from Ad_Claim_Header table, am getting the "Table or
view not found" error. But the tables are already exists in the particular
database(default) and S3 path.
I can able to query the Ad_Claim_Header table in HIVE and Spark sql without
any issues. Only the problem in Apache Hudi. Is that anywhere, do we need to do
any of the configuration for reading the existing tables while doing the
processing of another datasets.
Can anyone please help me on this.
spark-submit --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
--packages
org.apache.hudi:hudi-utilities-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.5
\
--master yarn --deploy-mode cluster \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.hive.convertMetastoreParquet=false \
/usr/lib/hudi/org.apache.hudi_hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
--table-type MERGE_ON_READ \
--op BULK_INSERT \
--source-ordering-field CLM_HDR_ADMISSION_DETAIL_SID \
--props
s3://aws-glue-udp-e2e-bkt-raw/properties/ad-clm-hdr-admission-detail.properties
\
--source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
--target-base-path
s3://aws-glue-udp-e2e-bkt-dtlke-raw/adj-claim/PRDMMIS/ad_clm_hdr_admission_detail
--target-table default.ad_clm_hdr_admission_detail \
--transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
--payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
--enable-hive-sync
Properties File
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.datasource.write.partitionpath.field=year,month
hoodie.deltastreamer.transformer.sql=SELECT a.CLM_HDR_ADMISSION_DETAIL_SID,
a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD, a.ADMISSION_SOURCE_LKPCD,
a.PATIENT_STATUS_LKPCD, a.CREATED_BY, to_date(a.CREATED_DATE,'DD-MON-YYYY
HH24:MI:SS AM') as CREATED_DATE, a.MODIFIED_BY,
to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as MODIFIED_DATE,
b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year,
string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a,
default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid
hoodie.datasource.write.recordkey.field=CLM_HDR_ADMISSION_DETAIL_SID
hoodie.datasource.write.hive_style_partitioning=true
#hive sync settings, uncomment if using flag --enable-hive-sync
hoodie.datasource.hive_sync.partition_fields=year,month
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
hoodie.datasource.hive_sync.table=ad_clm_hdr_admission_detail
# DFS Source
hoodie.deltastreamer.source.dfs.root=s3://aws-glue-udp-e2e-bkt-raw/adj-claim/PRDMMIS/AD_CLM_HDR_ADMISSION_DETAIL
**Environment Description**
* EMR version : 5.33
* Hudi version : hudi-utilities-bundle_2.11-0.5.2-incubating.jar
* Spark version : 2.4.7
* Hive version : 2.3.7
* Hadoop version : 2.10.1
* Storage (HDFS/S3/GCS..) :S3
* Running on Docker? (yes/no) : No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]