Cool. On Fri, May 3, 2019 at 12:27 PM Umesh Kacha <[email protected]> wrote:
> Hi Vinoth thanks I will come to this after sometime right now priority has > changed. > > Regards, > Umesh > > On Fri, May 3, 2019, 8:49 PM Vinoth Chandar <[email protected]> wrote: > > > Hi Umesh, > > > > Did it work? > > > > Thanks > > Vinoth > > > > On Tue, Apr 23, 2019 at 9:33 AM Vinoth Chandar <[email protected]> > wrote: > > > > > Hi Umesh, > > > > > > I took a pass. Moving HoodieTestDataGenerator into src/java is not a > good > > > idea. However, I have written up a simple demo app using the stock data > > > that we already use in our dockerized demo > > > https://github.com/vinothchandar/incubator-hudi/tree/quickstart > > > > > > Once you grab the code, build it using mvn clean install -DskipTests > > > -DskipITs > > > you should be able to run spark-submit --class HoodieDemoApp --master > > > local[2] hoodie-utilities/target/hoodie-utilities-0.4.6-SNAPSHOT.jar > and > > > get a dataset written.. > > > > > > You can make changes and iterator as you wish.. > > > > > > I really recommend using the dockerized setup described here. It does > the > > > same thing, but lets you play with the entire ecosystem. > > > https://hudi.apache.org/docker_demo.html > > > > > > Thanks > > > Vinoth > > > > > > > > > On Mon, Apr 22, 2019 at 9:14 AM Umesh Kacha <[email protected]> > > wrote: > > > > > >> Hi Vinoth thanks much. Eventual our deployment will be in AWS and we > > will > > >> be using Hoodie spark datasource to upsert delete as of now. > > >> > > >> Regards, > > >> Umesh > > >> > > >> On Mon, Apr 22, 2019 at 8:24 PM Vinoth Chandar <[email protected]> > > wrote: > > >> > > >> > Hi Umesh, > > >> > > > >> > This is on top of my list of the week. But If you already have input > > >> data > > >> > somewhere on s3/hdfs, nothing stops you from trying the > DeltaStreamer > > >> tool > > >> > or writing a simple spark job depending on hoodie-spark. Whats your > > >> > eventual deployment strategy? > > >> > > > >> > Thanks > > >> > Vinoth > > >> > > > >> > On Mon, Apr 22, 2019 at 6:09 AM Umesh Kacha <[email protected]> > > >> wrote: > > >> > > > >> > > Hi Vinoth can you please help with this I quickly want to try > > >> > HoodieJavaApp > > >> > > it seems to be partially working in my local setup with some run > > time > > >> > > dependencies failure as mentioned in the previous email. > > >> > > > > >> > > On Sat, Apr 20, 2019, 10:18 AM Umesh Kacha <[email protected] > > > > >> > wrote: > > >> > > > > >> > > > Thanks Vinoth yes please that would be great HoodieJavaApp moved > > >> out of > > >> > > > test and working. > > >> > > > > > >> > > > On Sat, Apr 20, 2019, 6:09 AM Vinoth Chandar < > > >> > > > [email protected]> wrote: > > >> > > > > > >> > > >> Sorry. Not following. If you are building your own spark job > > using > > >> > > hudi, > > >> > > >> then you just pull in hoodie-spark module > > >> > > >> > > >> > > >> http://hudi.apache.org/writing_data.html#datasource-writer > > >> > > >> > > >> > > >> > > >> > > >> Spark bundle can be used with —jars option on spark-shell etc > to > > >> query > > >> > > the > > >> > > >> datasets. > > >> > > >> > > >> > > >> Does that help? Can you describe what you are trying to > > accomplish? > > >> > > >> > > >> > > >> Checking again, do you need a patch with the HoodieJavaApp > moved > > >> out > > >> > of > > >> > > >> tests and working? > > >> > > >> > > >> > > >> On Fri, Apr 19, 2019 at 12:01 PM Umesh Kacha < > > >> [email protected]> > > >> > > >> wrote: > > >> > > >> > > >> > > >> > Thanks Vinoth how do I know what all spark jars and their > > >> versions I > > >> > > was > > >> > > >> > expecting hoodie-spark-bundle-0.4.5.jar would do that since > > it's > > >> an > > >> > > uber > > >> > > >> > jar but it's not recently I found I had to add spark maven > > >> > coordinates > > >> > > >> > separately in pom file. Anyways if you can give me list of > > jars I > > >> > can > > >> > > >> put > > >> > > >> > in a classpath and run. > > >> > > >> > > > >> > > >> > On Fri, Apr 19, 2019, 11:40 PM Vinoth Chandar < > > [email protected] > > >> > > > >> > > >> wrote: > > >> > > >> > > > >> > > >> > > Looks like a class mismatch error on Hadoop jars.. Easiest > > way > > >> to > > >> > do > > >> > > >> > this, > > >> > > >> > > is to pull the code into IntelliJ, add the spark jars > folder > > to > > >> > > >> module's > > >> > > >> > > class path and then run the test by right clicking > run > > >> > > >> > > > > >> > > >> > > I can prep a patch for you if you'd like. lmk > > >> > > >> > > > > >> > > >> > > Thanks > > >> > > >> > > Vinoth > > >> > > >> > > > > >> > > >> > > On Thu, Apr 18, 2019 at 8:46 AM Umesh Kacha < > > >> > [email protected]> > > >> > > >> > wrote: > > >> > > >> > > > > >> > > >> > > > Hi Vinoth, I could manage running HoodieJavaApp in my > local > > >> > maven > > >> > > >> > project > > >> > > >> > > > there I had to copy the following classes which were used > > by > > >> > > >> > > HoodieJavaApp. > > >> > > >> > > > Inside HoodieJavaTest main I am creating object of > > >> HoodieJavaApp > > >> > > >> which > > >> > > >> > > just > > >> > > >> > > > runs with all default options. > > >> > > >> > > > > > >> > > >> > > > [image: image.png] > > >> > > >> > > > > > >> > > >> > > > However I get the following error which seems like one of > > the > > >> > run > > >> > > >> time > > >> > > >> > > > dependencies missing. Please guide. > > >> > > >> > > > > > >> > > >> > > > Exception in thread "main" > > >> > > >> > > > com.uber.hoodie.exception.HoodieUpsertException: Failed > to > > >> > upsert > > >> > > >> for > > >> > > >> > > > commit time 20190418210326 > > >> > > >> > > > at > > >> > > >> > > >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149) > > >> > > >> > > > at > > >> > > >> > > >> com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) > > >> > > >> > > > at > > >> > > >> > > >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) > > >> > > >> > > > at > > >> > > >> > > >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198) > > >> > > >> > > > at HoodieJavaApp.run(HoodieJavaApp.java:143) > > >> > > >> > > > at HoodieJavaApp.main(HoodieJavaApp.java:67) > > >> > > >> > > > Caused by: org.apache.spark.SparkException: Job aborted > due > > >> to > > >> > > stage > > >> > > >> > > > failure: Task 0 in stage 27.0 failed 1 times, most recent > > >> > failure: > > >> > > >> Lost > > >> > > >> > > > task 0.0 in stage 27.0 (TID 49, localhost, executor > > driver): > > >> > > >> > > > java.lang.RuntimeException: > > >> > > >> > > com.uber.hoodie.exception.HoodieIndexException: > > >> > > >> > > > Error checking bloom filter index. > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > > >> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99) > > >> > > >> > > > at > > >> > > >> > > > >> > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > >> > > >> > > > at java.lang.Thread.run(Thread.java:745) > > >> > > >> > > > Caused by: > com.uber.hoodie.exception.HoodieIndexException: > > >> Error > > >> > > >> > checking > > >> > > >> > > > bloom filter index. > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119) > > >> > > >> > > > ... 13 more > > >> > > >> > > > Caused by: java.lang.NoSuchMethodError: > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166) > > >> > > >> > > > ... 15 more > > >> > > >> > > > > > >> > > >> > > > Driver stacktrace: > > >> > > >> > > > at org.apache.spark.scheduler.DAGScheduler.org > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > >> > > >> > > > at > > >> > > >> > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) > > >> > > >> > > > at scala.Option.foreach(Option.scala:257) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) > > >> > > >> > > > at > > >> > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > >> > > >> > > > at > > >> > > >> > > > >> > > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) > > >> > > >> > > > at > > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) > > >> > > >> > > > at > > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) > > >> > > >> > > > at > > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) > > >> > > >> > > > at > > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) > > >> > > >> > > > at > > >> > > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > > >> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > > >> > > >> > > > at org.apache.spark.rdd.RDD.collect(RDD.scala:934) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > > >> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374) > > >> > > >> > > > at > > >> > > >> > > > > >> > > >> > > >> > > > org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64) > > >> > > >> > > > at > > >> > > >> > > > >> > > com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428) > > >> > > >> > > > at > > >> > > >> > > >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170) > > >> > > >> > > > ... 8 more > > >> > > >> > > > Caused by: java.lang.RuntimeException: > > >> > > >> > > > com.uber.hoodie.exception.HoodieIndexException: Error > > >> checking > > >> > > bloom > > >> > > >> > > filter > > >> > > >> > > > index. > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) > > >> > > >> > > > at > > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > > >> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99) > > >> > > >> > > > at > > >> > > >> > > > >> > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > >> > > >> > > > at java.lang.Thread.run(Thread.java:745) > > >> > > >> > > > Caused by: > com.uber.hoodie.exception.HoodieIndexException: > > >> Error > > >> > > >> > checking > > >> > > >> > > > bloom filter index. > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119) > > >> > > >> > > > ... 13 more > > >> > > >> > > > Caused by: java.lang.NoSuchMethodError: > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68) > > >> > > >> > > > at > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > >> > > > >> > > > com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166) > > >> > > >> > > > ... 15 more > > >> > > >> > > > > > >> > > >> > > > On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar < > > >> > [email protected] > > >> > > > > > >> > > >> > > wrote: > > >> > > >> > > > > > >> > > >> > > >> Hi Umesh, > > >> > > >> > > >> > > >> > > >> > > >> IIUC, your suggestion is without the need to > > checkout/build > > >> > > source > > >> > > >> > code, > > >> > > >> > > >> one should be able to run the sample app? That does seem > > >> fair > > >> > to > > >> > > >> me. > > >> > > >> > We > > >> > > >> > > >> had to move test data generator out of tests to place > this > > >> > under > > >> > > >> > source > > >> > > >> > > >> code. > > >> > > >> > > >> > > >> > > >> > > >> I am hoping something like hoodie-bench could be a more > > >> > > >> comprehensive > > >> > > >> > > >> replacement for this mid term. > > >> > > >> > > >> https://github.com/apache/incubator-hudi/pull/623/files > > >> > > Thoughts? > > >> > > >> > > >> > > >> > > >> > > >> But, in the short term, let us know if it becomes too > > >> > cumbersome > > >> > > >> for > > >> > > >> > you > > >> > > >> > > >> to > > >> > > >> > > >> try out HoodieJavaApp. > > >> > > >> > > >> > > >> > > >> > > >> Thanks > > >> > > >> > > >> Vinoth > > >> > > >> > > >> > > >> > > >> > > >> On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha < > > >> > > [email protected] > > >> > > >> > > > >> > > >> > > >> wrote: > > >> > > >> > > >> > > >> > > >> > > >> > I can see there is a todo do what I suggested, > > >> > > >> > > >> > > > >> > > >> > > >> > #TODO - Need to move TestDataGenerator and > HoodieJavaApp > > >> out > > >> > of > > >> > > >> > tests > > >> > > >> > > >> > > > >> > > >> > > >> > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha < > > >> > > >> [email protected]> > > >> > > >> > > >> wrote: > > >> > > >> > > >> > > > >> > > >> > > >> > > Ok this useful class should have been part of > utility > > >> and > > >> > > >> should > > >> > > >> > be > > >> > > >> > > >> able > > >> > > >> > > >> > > to run out of the box as IMHO developer need not > > >> > necessarily > > >> > > >> build > > >> > > >> > > >> > project. > > >> > > >> > > >> > > I tried to create a maven project where I kept > > >> > > >> hoodie-spark-bundle > > >> > > >> > > as > > >> > > >> > > >> > > dependency and copied HoodieJavaApp and > > >> DataSourceTestUtils > > >> > > >> class > > >> > > >> > > >> into it > > >> > > >> > > >> > > but it does not compile. I have bee told here that > > >> > > >> > > >> hoodie-spark-bundle is > > >> > > >> > > >> > > uber jar but I doubt it is. > > >> > > >> > > >> > > > > >> > > >> > > >> > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen < > > >> > > >> [email protected]> > > >> > > >> > > >> wrote: > > >> > > >> > > >> > > > > >> > > >> > > >> > >> Hi Umesh, > > >> > > >> > > >> > >> I believe *HoodieJavaApp *is a test class under > > >> > > >> *hoodie-spark.* > > >> > > >> > > >> > >> AFAIK, test classes are not supposed to be included > > in > > >> the > > >> > > >> > > artifact. > > >> > > >> > > >> > >> However, if you want to build an artifact where you > > >> have > > >> > > >> access > > >> > > >> > to > > >> > > >> > > >> test > > >> > > >> > > >> > >> classes, you would build from source code. > > >> > > >> > > >> > >> Once you build the hoodie project, you are able to > > >> find a > > >> > > test > > >> > > >> > jar > > >> > > >> > > >> that > > >> > > >> > > >> > >> includes *HoodieJavaApp *under > > >> > > >> > > >> > >> > > >> > > *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.* > > >> > > >> > > >> > >> > > >> > > >> > > >> > >> Thanks > > >> > > >> > > >> > >> Jing > > >> > > >> > > >> > >> > > >> > > >> > > >> > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha < > > >> > > >> > > [email protected]> > > >> > > >> > > >> > >> wrote: > > >> > > >> > > >> > >> > > >> > > >> > > >> > >> > Hi I am not able to import class HoodieJavaApp > > using > > >> any > > >> > > of > > >> > > >> the > > >> > > >> > > >> maven > > >> > > >> > > >> > >> jars. > > >> > > >> > > >> > >> > I tried hooodie-spark-bundle and hoodie-spark > both. > > >> It > > >> > > >> simply > > >> > > >> > > does > > >> > > >> > > >> not > > >> > > >> > > >> > >> find > > >> > > >> > > >> > >> > this class. I am using 0.4.5. Please guide. > > >> > > >> > > >> > >> > > > >> > > >> > > >> > >> > Regards, > > >> > > >> > > >> > >> > Umesh > > >> > > >> > > >> > >> > > > >> > > >> > > >> > >> > > >> > > >> > > >> > > > > >> > > >> > > >> > > > >> > > >> > > >> > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > > >> > > > > >> > > > >> > > > > > >
