ad1happy2go commented on issue #8672:
URL: https://github.com/apache/hudi/issues/8672#issuecomment-1540379918
@ankitchandnani Able to reproduce this issue. Will Look into it why this is
happening.
```
#Put full.parquet into the input dir
~/spark/spark-3.2.3-bin-hadoop3.2/bin/spark-submit --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.12.2.jar
\
--table-type COPY_ON_WRITE \
--source-ordering-field seq_no \
--hoodie-conf hoodie.datasource.write.recordkey.field=driver_id \
--hoodie-conf hoodie.datasource.write.partitionpath.field= \
--hoodie-conf
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
\
--hoodie-conf hoodie.cleaner.commits.retained=10 \
--hoodie-conf "hoodie.deltastreamer.transformer.sql=select *, 1==2 AS
_hoodie_is_deleted from <SRC> a" \
--hoodie-conf hoodie.datasource.hive_sync.support_timestamp=false \
--target-base-path file:///tmp/issue_8672_2 \
--target-table insert_overwrite_test \
--transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
--hoodie-conf
hoodie.deltastreamer.source.dfs.root=file:///tmp/issue_8672_input \
--source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
--op INSERT
scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").count()
23/05/09 20:44:28 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
23/05/09 20:44:28 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
res0: Long = 2
scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").show()
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+------+-----+------+------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
_hoodie_file_name| op|driver_id|driver_name|state|salary|
car|seq_no|_hoodie_is_deleted|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+------+-----+------+------------------+
| 20230509203417073|20230509203417073...| driver_id:101|
|ddef0460-f824-43b...| I| 101| John| NY|8000.0|Honda|
| false|
| 20230509203417073|20230509203417073...| driver_id:102|
|ddef0460-f824-43b...| I| 102| Mike| CA|9000.0| KIA|
| false|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+------+-----+------+------------------+
#Put cdc.parquet into the input dir
~/spark/spark-3.2.3-bin-hadoop3.2/bin/spark-submit --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.12.2.jar \
--table-type COPY_ON_WRITE \
--source-ordering-field seq_no \
--hoodie-conf hoodie.datasource.write.recordkey.field=driver_id \
--hoodie-conf hoodie.datasource.write.partitionpath.field= \
--hoodie-conf
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
\
--hoodie-conf hoodie.cleaner.commits.retained=10 \
--hoodie-conf "hoodie.deltastreamer.transformer.sql=select *, 1==2
AS _hoodie_is_deleted from <SRC> a" \
--hoodie-conf hoodie.datasource.hive_sync.support_timestamp=false \
--target-base-path file:///tmp/issue_8672_2 \
--target-table insert_overwrite_test \
--transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
--hoodie-conf
hoodie.deltastreamer.source.dfs.root=file:///tmp/issue_8672_input \
--source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
--op INSERT_OVERWRITE_TABLE
scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").count()
23/05/09 20:49:05 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
23/05/09 20:49:05 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
res0: Long = 5
scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").show()
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+-------+------+------+------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
_hoodie_file_name| op|driver_id|driver_name|state| salary|
car|seq_no|_hoodie_is_deleted|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+-------+------+------+------------------+
| 20230509204818837|20230509204818837...| driver_id:101|
|13fd4e9c-53a2-4e7...| U| 101| null| NJ|15000.0| null|
0001| false|
| 20230509204818837|20230509204818837...| driver_id:101|
|13fd4e9c-53a2-4e7...| U| 101| null| PA| null| null|
0002| false|
| 20230509204818837|20230509204818837...| driver_id:102|
|13fd4e9c-53a2-4e7...| U| 102| null| null| null|Toyota|
0003| false|
| 20230509203417073|20230509203417073...| driver_id:101|
|ddef0460-f824-43b...| I| 101| John| NY| 8000.0| Honda|
| false|
| 20230509203417073|20230509203417073...| driver_id:102|
|ddef0460-f824-43b...| I| 102| Mike| CA| 9000.0| KIA|
| false|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+-------+------+------+------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]