Looks like a class mismatch error on Hadoop jars.. Easiest way to do this, is to pull the code into IntelliJ, add the spark jars folder to module's class path and then run the test by right clicking > run
I can prep a patch for you if you'd like. lmk Thanks Vinoth On Thu, Apr 18, 2019 at 8:46 AM Umesh Kacha <[email protected]> wrote: > Hi Vinoth, I could manage running HoodieJavaApp in my local maven project > there I had to copy the following classes which were used by HoodieJavaApp. > Inside HoodieJavaTest main I am creating object of HoodieJavaApp which just > runs with all default options. > > [image: image.png] > > However I get the following error which seems like one of the run time > dependencies missing. Please guide. > > Exception in thread "main" > com.uber.hoodie.exception.HoodieUpsertException: Failed to upsert for > commit time 20190418210326 > at com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175) > at > com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153) > at > com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149) > at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198) > at HoodieJavaApp.run(HoodieJavaApp.java:143) > at HoodieJavaApp.main(HoodieJavaApp.java:67) > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 0 in stage 27.0 failed 1 times, most recent failure: Lost > task 0.0 in stage 27.0 (TID 49, localhost, executor driver): > java.lang.RuntimeException: com.uber.hoodie.exception.HoodieIndexException: > Error checking bloom filter index. > at > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121) > at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.uber.hoodie.exception.HoodieIndexException: Error checking > bloom filter index. > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196) > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90) > at > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119) > ... 13 more > Caused by: java.lang.NoSuchMethodError: > org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V > at > com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79) > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68) > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166) > ... 15 more > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.collect(RDD.scala:934) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374) > at org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312) > at > com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64) > at com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56) > at > com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428) > at com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170) > ... 8 more > Caused by: java.lang.RuntimeException: > com.uber.hoodie.exception.HoodieIndexException: Error checking bloom filter > index. > at > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121) > at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.uber.hoodie.exception.HoodieIndexException: Error checking > bloom filter index. > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196) > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90) > at > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119) > ... 13 more > Caused by: java.lang.NoSuchMethodError: > org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V > at > com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79) > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68) > at > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166) > ... 15 more > > On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar <[email protected]> wrote: > >> Hi Umesh, >> >> IIUC, your suggestion is without the need to checkout/build source code, >> one should be able to run the sample app? That does seem fair to me. We >> had to move test data generator out of tests to place this under source >> code. >> >> I am hoping something like hoodie-bench could be a more comprehensive >> replacement for this mid term. >> https://github.com/apache/incubator-hudi/pull/623/files Thoughts? >> >> But, in the short term, let us know if it becomes too cumbersome for you >> to >> try out HoodieJavaApp. >> >> Thanks >> Vinoth >> >> On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha <[email protected]> >> wrote: >> >> > I can see there is a todo do what I suggested, >> > >> > #TODO - Need to move TestDataGenerator and HoodieJavaApp out of tests >> > >> > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha <[email protected]> >> wrote: >> > >> > > Ok this useful class should have been part of utility and should be >> able >> > > to run out of the box as IMHO developer need not necessarily build >> > project. >> > > I tried to create a maven project where I kept hoodie-spark-bundle as >> > > dependency and copied HoodieJavaApp and DataSourceTestUtils class >> into it >> > > but it does not compile. I have bee told here that >> hoodie-spark-bundle is >> > > uber jar but I doubt it is. >> > > >> > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen <[email protected]> >> wrote: >> > > >> > >> Hi Umesh, >> > >> I believe *HoodieJavaApp *is a test class under *hoodie-spark.* >> > >> AFAIK, test classes are not supposed to be included in the artifact. >> > >> However, if you want to build an artifact where you have access to >> test >> > >> classes, you would build from source code. >> > >> Once you build the hoodie project, you are able to find a test jar >> that >> > >> includes *HoodieJavaApp *under >> > >> *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.* >> > >> >> > >> Thanks >> > >> Jing >> > >> >> > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha <[email protected]> >> > >> wrote: >> > >> >> > >> > Hi I am not able to import class HoodieJavaApp using any of the >> maven >> > >> jars. >> > >> > I tried hooodie-spark-bundle and hoodie-spark both. It simply does >> not >> > >> find >> > >> > this class. I am using 0.4.5. Please guide. >> > >> > >> > >> > Regards, >> > >> > Umesh >> > >> > >> > >> >> > > >> > >> >
