soumilshah1995 commented on issue #10644:
URL: https://github.com/apache/hudi/issues/10644#issuecomment-2251065010
Hi there I saw that this ticket was completed and I was trying out this
functionality
Docs
```
xport to json or parquet dataset with transformation/filtering
The Exporter supports custom transformation/filtering on records before
writing to json or parquet dataset. This is done by supplying implementation of
org.apache.hudi.utilities.transform.Transformer via --transformer-class option.
spark-submit \
--jars
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.15.0.jar" \
--deploy-mode "client" \
--class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.15.0.jar \
--source-base-path "/tmp/" \
--target-output-path "/tmp/exported/json/" \
--transformer-class
"org.apache.hudi.utilities.transform.SqlQueryBasedTransformer" \
--transformer-sql "SELECT substr(rider,1,10) as rider, trip_type as
tripType FROM <SRC> WHERE trip_type = 'BLACK' LIMIT 10" \
--output-format "json" # or "parquet"
```
https://hudi.apache.org/docs/next/snapshot_exporter/
Following is failing
TEST 1: NO tranformer PASS
```
spark-submit \
--class org.apache.hudi.utilities.HoodieSnapshotExporter \
--packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.15.0 \
--master 'local[*]' \
--executor-memory 1g \
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.15.0.jar
\
--source-base-path
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/'
\
--target-output-path
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/dump/json/'
\
--output-format 'parquet'
```
# Test 2 : With transformer
```
spark-submit \
--class org.apache.hudi.utilities.HoodieSnapshotExporter \
--packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.15.0 \
--master 'local[*]' \
--executor-memory 1g \
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.15.0.jar
\
--source-base-path
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/'
\
--target-output-path
'/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/dump/json/'
\
--transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
--transformer-sql "SELECT * FROM <SRC> WHERE destinationstate='NY'" \
--output-format 'parquet'
```
# logs
```
vy Default Cache set to: /Users/soumilshah/.ivy2/cache
The jars for the packages stored in: /Users/soumilshah/.ivy2/jars
org.apache.hudi#hudi-spark3.4-bundle_2.12 added as a dependency
:: resolving dependencies ::
org.apache.spark#spark-submit-parent-f0fce1d3-e446-495c-a37a-e2dd7e335611;1.0
confs: [default]
found org.apache.hudi#hudi-spark3.4-bundle_2.12;0.15.0 in central
:: resolution report :: resolve 56ms :: artifacts dl 1ms
:: modules in use:
org.apache.hudi#hudi-spark3.4-bundle_2.12;0.15.0 from central in
[default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 1 | 0 |
---------------------------------------------------------------------
:: retrieving ::
org.apache.spark#spark-submit-parent-f0fce1d3-e446-495c-a37a-e2dd7e335611
confs: [default]
0 artifacts copied, 1 already retrieved (0kB/2ms)
24/07/25 13:43:59 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Exception in thread "main"
org.apache.hudi.com.beust.jcommander.ParameterException: Was passed main
parameter '--transformer-class' but no main parameter was defined in your arg
class
at
org.apache.hudi.com.beust.jcommander.JCommander.initMainParameterValue(JCommander.java:954)
at
org.apache.hudi.com.beust.jcommander.JCommander.parseValues(JCommander.java:755)
at
org.apache.hudi.com.beust.jcommander.JCommander.parse(JCommander.java:356)
at
org.apache.hudi.com.beust.jcommander.JCommander.parse(JCommander.java:335)
at
org.apache.hudi.com.beust.jcommander.JCommander.<init>(JCommander.java:251)
at
org.apache.hudi.utilities.HoodieSnapshotExporter.main(HoodieSnapshotExporter.java:292)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
24/07/25 13:43:59 INFO ShutdownHookManager: Shutdown hook called
24/07/25 13:43:59 INFO ShutdownHookManager: Deleting directory
/private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-4431a82f-15b8-4ac6-948a-db853cbf9fe3
(base) soumilshah@ip-192-168-1-31 E1 %
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]