bvaradar commented on issue #2005:
URL: https://github.com/apache/hudi/issues/2005#issuecomment-680174568


   @cdmikechen : The integration test actually brings up a dockerized 
environment and runs spark-submit command. So, the dependencies specified in 
hudi-integ-test/pom.xml should not be part of the deltastreamer run. As you can 
look from my spark-submit command in my previous comment, only 
HUDI_UTILITIES_BUNDLE is passed. otherwise, it is only spark runtime 
environment. 
   
   The ParquetInputFormatClass is part of parquer-hadoop-bundle.
   root@adhoc-2:/opt# grep -r 'parquet.hadoop.ParquetInputFormat' 
$SPARK_HOME/jars/*
   Binary file /opt/spark/jars/parquet-hadoop-1.10.1.jar matches
   Binary file /opt/spark/jars/parquet-hadoop-bundle-1.6.0.jar matches
   root@adhoc-2:/opt# 
   
   root@adhoc-2:/opt# jar tf /opt/spark/jars/parquet-hadoop-bundle-1.6.0.jar | 
grep ParquetInputFormat.class
   parquet/hadoop/mapred/DeprecatedParquetInputFormat.class
   parquet/hadoop/ParquetInputFormat.class
   
   Are you using the spark distribution prebuilt with hadoop ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to