[ https://issues.apache.org/jira/browse/HUDI-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007908#comment-17007908 ]
lamber-ken commented on HUDI-486: --------------------------------- Let me summarize the reproduce steps. 1, use git checkout commit e1e5fe33249bf511486073dd9cf48e5b7ea14816 2, build source {code:java} mvn clean package -DskipTests -DskipITs -Dcheckstyle.skip=true -Drat.skip=true {code} 3, setup docker {code:java} cd docker && ./setup_demo.sh{code} 4, generate data {code:java} cat demo/data/batch_1.json | kafkacat -b kafkabroker -t stock_ticks -P {code} 5, go into the container {code:java} docker exec -it adhoc-2 /bin/bash {code} 6, comsume kafka data && sync to hive {code:java} spark-submit \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \ --storage-type COPY_ON_WRITE \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --source-ordering-field ts \ --target-base-path /user/hive/warehouse/stock_ticks_cow \ --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \ --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider /var/hoodie/ws/hudi-hive/run_sync_tool.sh \ --jdbc-url jdbc:hive2://hiveserver:10000 \ --user hive \ --pass hive \ --partitioned-by dt \ --base-path /user/hive/warehouse/stock_ticks_cow \ --database default \ --table stock_ticks_cow {code} 7, create incr_pull.txt {code:java} select `_hoodie_commit_time`, symbol, ts from default.stock_ticks_cow where symbol = 'GOOG' and `_hoodie_commit_time` > '20180924064621' {code} 8, execute org.apache.hudi.utilities.HiveIncrementalPuller {code:java} java -cp ./spark/jars/commons-cli-1.2.jar:./spark/jars/htrace-core-3.1.0-incubating.jar:./spark/jars/hadoop-hdfs-2.7.3.jar:./hive/lib/hive-exec-2.3.3.jar:./hive/lib/hive-common-2.3.3.jar:./hive/lib/hive-jdbc-2.3.3.jar:./hive/lib/hive-service-2.3.3.jar:./hive/lib/hive-service-rpc-2.3.3.jar:./spark/jars/httpcore-4.4.10.jar:./spark/jars/slf4j-api-1.7.16.jar:./spark/jars/hadoop-auth-2.7.3.jar:./hive/lib/commons-lang-2.6.jar:./spark/jars/commons-configuration-1.6.jar:./spark/jars/commons-collections-3.2.2.jar:./spark/jars/hadoop-common-2.7.3.jar:./hive/lib/antlr-runtime-3.5.2.jar:./spark/jars/log4j-1.2.17.jar:./hive/lib/commons-logging-1.2.jar:./hive/lib/commons-io-2.4.jar:$HUDI_UTILITIES_BUNDLE \ org.apache.hudi.utilities.HiveIncrementalPuller \ --hiveUrl jdbc:hive2://hiveserver:10000 \ --hiveUser hive \ --hivePass hive \ --extractSQLFile /var/hoodie/ws/docker/demo/config/incr_pull.txt \ --sourceDb default \ --sourceTable stock_ticks_cow \ --targetDb default \ --tmpdb default \ --targetTable tempTable \ --fromCommitTime 0 \ --maxCommits 1 {code} > Improve documentation for using HiveIncrementalPuller > ----------------------------------------------------- > > Key: HUDI-486 > URL: https://issues.apache.org/jira/browse/HUDI-486 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Incremental Pull > Reporter: Pratyaksh Sharma > Assignee: Pratyaksh Sharma > Priority: Major > > For using HiveIncrementalPuller, one needs to have a lot of jars in > classPath. These jars are not listed anywhere. As a result, one has to keep > on adding the jars incrementally to the classPath with every > NoClassDefFoundError coming up when executing. > We should list down the jars needed so that it becomes easy for a first-time > user to use the mentioned tool. -- This message was sent by Atlassian Jira (v8.3.4#803005)