[GitHub] [iceberg] mayursrivastava edited a comment on pull request #2286: Add Arrow vectorized reader

GitBox Wed, 21 Apr 2021 09:53:42 -0700


mayursrivastava edited a comment on pull request #2286:
URL: https://github.com/apache/iceberg/pull/2286#issuecomment-824210180



   `
   ./gradlew :iceberg-spark:jmh 
-PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark 
-PjmhOutputPath=benchmark/results.txt
   `
   
   @rymurr, is this :iceberg-spark2:jmh? Looks like this requires Java8 
runtime. Am I right? I ran it on the master branch (without my changes), but it 
fails with the following error. Does it need a powerful machine?
   
   `
   JMH version: 1.21
   VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
   VM invoker: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
   VM options: <none>
   Warmup: 3 iterations, single-shot each
   Measurement: 5 iterations, single-shot each
   Timeout: 10 min per iteration
   Threads: 1 thread
   Benchmark mode: Single shot invocation time
   Benchmark: 
org.apache.iceberg.spark.source.parquet.vectorized.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesIcebergVectorized5k
   
   # Run progress: 0.00% complete, ETA 00:00:00
   # Fork: 1 of 1
   # Warmup Iteration   1: (*interrupt*) <failure>
   
   java.lang.InterruptedException
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
           at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:206)
           at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222)
           at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:157)
           at 
org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:243)
           at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:750)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:64)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
           at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
           at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
           at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
           at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
           at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:280)
           at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
           at 
org.apache.iceberg.spark.source.IcebergSourceBenchmark.appendAsFile(IcebergSourceBenchmark.java:130)
           at 
org.apache.iceberg.spark.source.parquet.vectorized.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.appendData(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.java:82)
           at 
org.apache.iceberg.spark.source.parquet.vectorized.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.setupBenchmark(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.java:56)
           at 
org.apache.iceberg.spark.source.parquet.vectorized.generated.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest._jmh_tryInit_f_vectorizedreaddictionaryencodedflatparquetdatabenchmark0_G(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest.java:438)
           at 
org.apache.iceberg.spark.source.parquet.vectorized.generated.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest.readDatesIcebergVectorized5k_SingleShotTime(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest.java:363)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
           at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   
   
   Benchmark had encountered error, and fail on error was requested
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] mayursrivastava edited a comment on pull request #2286: Add Arrow vectorized reader

Reply via email to