[GitHub] [hudi] zhedoubushishi commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

GitBox Tue, 20 Oct 2020 21:30:58 -0700


zhedoubushishi commented on pull request #1760:
URL: https://github.com/apache/hudi/pull/1760#issuecomment-713294221



   > @bschell : For running integration tests with hudi packages built with 
scala 2.12, we just need to change scripts/run_travis_tests.sh. The docker 
container should automatically load those jars for running integration tests.
   > 
   > ```diff
   > index 63fb959c..b77b4f64 100755
   > --- a/scripts/run_travis_tests.sh
   > +++ b/scripts/run_travis_tests.sh
   > @@ -35,7 +35,7 @@ elif [ "$mode" = "integration" ]; then
   >    export SPARK_HOME=$PWD/spark-${sparkVersion}-bin-hadoop${hadoopVersion}
   >    mkdir /tmp/spark-events/
   >    echo "Running Integration Tests"
   > -  mvn verify -Pintegration-tests -B
   > +  mvn verify -Pintegration-tests -Dscala-2.12 -B
   >  else
   >    echo "Unknown mode $mode"
   >    exit 1
   > ```
   
   Is this a permanent change or we just try to run test here. I ran into a 
scala class not found error when running docker integ testing for Hudi:
   
   ```
   20/10/21 00:04:17 WARN SparkContext: Using an existing SparkContext; some 
configuration may not take effect.
   Exception in thread "main" java.lang.NoSuchMethodError: 
scala.collection.JavaConversions$.deprecated$u0020asScalaIterator(Ljava/util/Iterator;)Lscala/collection/Iterator;
        at 
org.apache.hudi.IncrementalRelation.<init>(IncrementalRelation.scala:78)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:95)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at 
org.apache.hudi.utilities.sources.HoodieIncrSource.fetchNextBatch(HoodieIncrSource.java:122)
        at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
        at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:75)
        at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:68)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:364)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:253)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:163)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:161)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:466)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   I suspect that although Hudi is able to pick the 
```hudi-spark_2.12-bundle.jar``` but since the docker environment still uses 
spark_2.11, so there's still some conflict between scala 2.11 & 2.12.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zhedoubushishi commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

Reply via email to