bvaradar commented on issue #2005: URL: https://github.com/apache/hudi/issues/2005#issuecomment-680174568
@cdmikechen : The integration test actually brings up a dockerized environment and runs spark-submit command. So, the dependencies specified in hudi-integ-test/pom.xml should not be part of the deltastreamer run. As you can look from my spark-submit command in my previous comment, only HUDI_UTILITIES_BUNDLE is passed. otherwise, it is only spark runtime environment. The ParquetInputFormatClass is part of parquer-hadoop-bundle. root@adhoc-2:/opt# grep -r 'parquet.hadoop.ParquetInputFormat' $SPARK_HOME/jars/* Binary file /opt/spark/jars/parquet-hadoop-1.10.1.jar matches Binary file /opt/spark/jars/parquet-hadoop-bundle-1.6.0.jar matches root@adhoc-2:/opt# root@adhoc-2:/opt# jar tf /opt/spark/jars/parquet-hadoop-bundle-1.6.0.jar | grep ParquetInputFormat.class parquet/hadoop/mapred/DeprecatedParquetInputFormat.class parquet/hadoop/ParquetInputFormat.class Are you using the spark distribution prebuilt with hadoop ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
