soumilshah1995 commented on issue #10644:
URL: https://github.com/apache/hudi/issues/10644#issuecomment-2251065010

   Hi there I saw that this ticket was completed and I was trying out this 
functionality 
   
   Docs 
   ```
   
   xport to json or parquet dataset with transformation/filtering
   
   The Exporter supports custom transformation/filtering on records before 
writing to json or parquet dataset. This is done by supplying implementation of 
org.apache.hudi.utilities.transform.Transformer via --transformer-class option.
   
   spark-submit \
     --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.15.0.jar" \
     --deploy-mode "client" \
     --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
         
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.15.0.jar \
     --source-base-path "/tmp/" \
     --target-output-path "/tmp/exported/json/" \
     --transformer-class 
"org.apache.hudi.utilities.transform.SqlQueryBasedTransformer" \
     --transformer-sql "SELECT substr(rider,1,10) as rider, trip_type as 
tripType FROM <SRC> WHERE trip_type = 'BLACK' LIMIT 10" \
     --output-format "json"  # or "parquet"
   ```
   
   https://hudi.apache.org/docs/next/snapshot_exporter/
   
   Following is failing 
   
   TEST 1:  NO tranformer  PASS
   ```
   spark-submit \
       --class org.apache.hudi.utilities.HoodieSnapshotExporter \
       --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.15.0 \
       --master 'local[*]' \
       --executor-memory 1g \
       
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.15.0.jar
 \
       --source-base-path 
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/'
 \
       --target-output-path 
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/dump/json/'
 \
       --output-format 'parquet'
   ```
   
   
   # Test 2 : With transformer
   ```
   spark-submit \
       --class org.apache.hudi.utilities.HoodieSnapshotExporter \
       --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.15.0 \
       --master 'local[*]' \
       --executor-memory 1g \
       
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.15.0.jar
 \
       --source-base-path 
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/'
 \
       --target-output-path 
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/dump/json/'
 \
       --transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
       --transformer-sql "SELECT * FROM <SRC> WHERE destinationstate='NY'" \
       --output-format 'parquet'
   ```
   # logs
   ```
   
   vy Default Cache set to: /Users/soumilshah/.ivy2/cache
   The jars for the packages stored in: /Users/soumilshah/.ivy2/jars
   org.apache.hudi#hudi-spark3.4-bundle_2.12 added as a dependency
   :: resolving dependencies :: 
org.apache.spark#spark-submit-parent-f0fce1d3-e446-495c-a37a-e2dd7e335611;1.0
        confs: [default]
        found org.apache.hudi#hudi-spark3.4-bundle_2.12;0.15.0 in central
   :: resolution report :: resolve 56ms :: artifacts dl 1ms
        :: modules in use:
        org.apache.hudi#hudi-spark3.4-bundle_2.12;0.15.0 from central in 
[default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
        ---------------------------------------------------------------------
   :: retrieving :: 
org.apache.spark#spark-submit-parent-f0fce1d3-e446-495c-a37a-e2dd7e335611
        confs: [default]
        0 artifacts copied, 1 already retrieved (0kB/2ms)
   24/07/25 13:43:59 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Exception in thread "main" 
org.apache.hudi.com.beust.jcommander.ParameterException: Was passed main 
parameter '--transformer-class' but no main parameter was defined in your arg 
class
        at 
org.apache.hudi.com.beust.jcommander.JCommander.initMainParameterValue(JCommander.java:954)
        at 
org.apache.hudi.com.beust.jcommander.JCommander.parseValues(JCommander.java:755)
        at 
org.apache.hudi.com.beust.jcommander.JCommander.parse(JCommander.java:356)
        at 
org.apache.hudi.com.beust.jcommander.JCommander.parse(JCommander.java:335)
        at 
org.apache.hudi.com.beust.jcommander.JCommander.<init>(JCommander.java:251)
        at 
org.apache.hudi.utilities.HoodieSnapshotExporter.main(HoodieSnapshotExporter.java:292)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   24/07/25 13:43:59 INFO ShutdownHookManager: Shutdown hook called
   24/07/25 13:43:59 INFO ShutdownHookManager: Deleting directory 
/private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-4431a82f-15b8-4ac6-948a-db853cbf9fe3
   (base) soumilshah@ip-192-168-1-31 E1 % 
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to