ad1happy2go commented on issue #8672:
URL: https://github.com/apache/hudi/issues/8672#issuecomment-1540379918

   @ankitchandnani Able to reproduce this issue. Will Look into it why this is 
happening.
   
   ```
   #Put full.parquet into the input dir
   
   ~/spark/spark-3.2.3-bin-hadoop3.2/bin/spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.12.2.jar 
\
   --table-type COPY_ON_WRITE \
   --source-ordering-field seq_no \
   --hoodie-conf hoodie.datasource.write.recordkey.field=driver_id \
   --hoodie-conf hoodie.datasource.write.partitionpath.field= \
   --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 \
   --hoodie-conf hoodie.cleaner.commits.retained=10 \
   --hoodie-conf "hoodie.deltastreamer.transformer.sql=select *, 1==2 AS 
_hoodie_is_deleted from <SRC> a" \
   --hoodie-conf hoodie.datasource.hive_sync.support_timestamp=false \
   --target-base-path file:///tmp/issue_8672_2 \
   --target-table insert_overwrite_test \
   --transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
   --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=file:///tmp/issue_8672_input \
   --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
   --op INSERT
   
   scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").count()
   23/05/09 20:44:28 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   23/05/09 20:44:28 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   res0: Long = 2
   
   
   scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").show()
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+------+-----+------+------------------+
   
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name| op|driver_id|driver_name|state|salary|  
car|seq_no|_hoodie_is_deleted|
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+------+-----+------+------------------+
   |  20230509203417073|20230509203417073...|     driver_id:101|                
      |ddef0460-f824-43b...|  I|      101|       John|   NY|8000.0|Honda|      
|             false|
   |  20230509203417073|20230509203417073...|     driver_id:102|                
      |ddef0460-f824-43b...|  I|      102|       Mike|   CA|9000.0|  KIA|      
|             false|
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+------+-----+------+------------------+
   
   #Put cdc.parquet into the input dir
   
   ~/spark/spark-3.2.3-bin-hadoop3.2/bin/spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
           
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.12.2.jar \
           --table-type COPY_ON_WRITE \
           --source-ordering-field seq_no \
           --hoodie-conf hoodie.datasource.write.recordkey.field=driver_id \
           --hoodie-conf hoodie.datasource.write.partitionpath.field= \
           --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 \
           --hoodie-conf hoodie.cleaner.commits.retained=10 \
           --hoodie-conf "hoodie.deltastreamer.transformer.sql=select *, 1==2 
AS _hoodie_is_deleted from <SRC> a" \
           --hoodie-conf hoodie.datasource.hive_sync.support_timestamp=false \
           --target-base-path file:///tmp/issue_8672_2 \
           --target-table insert_overwrite_test \
           --transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
           --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=file:///tmp/issue_8672_input \
           --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
           --op INSERT_OVERWRITE_TABLE
   
   scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").count()
   23/05/09 20:49:05 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   23/05/09 20:49:05 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   res0: Long = 5
   
   scala> spark.read.format("hudi").load("file:///tmp/issue_8672_2").show()
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+-------+------+------+------------------+
   
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name| op|driver_id|driver_name|state| salary|   
car|seq_no|_hoodie_is_deleted|
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+-------+------+------+------------------+
   |  20230509204818837|20230509204818837...|     driver_id:101|                
      |13fd4e9c-53a2-4e7...|  U|      101|       null|   NJ|15000.0|  null|  
0001|             false|
   |  20230509204818837|20230509204818837...|     driver_id:101|                
      |13fd4e9c-53a2-4e7...|  U|      101|       null|   PA|   null|  null|  
0002|             false|
   |  20230509204818837|20230509204818837...|     driver_id:102|                
      |13fd4e9c-53a2-4e7...|  U|      102|       null| null|   null|Toyota|  
0003|             false|
   |  20230509203417073|20230509203417073...|     driver_id:101|                
      |ddef0460-f824-43b...|  I|      101|       John|   NY| 8000.0| Honda|     
 |             false|
   |  20230509203417073|20230509203417073...|     driver_id:102|                
      |ddef0460-f824-43b...|  I|      102|       Mike|   CA| 9000.0|   KIA|     
 |             false|
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-----+-------+------+------+------------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to