[GitHub] [iceberg] nastra opened a new issue #3157: VectorizedReadDictionaryEncodedFlatParquetDataBenchmark not working

GitBox Mon, 20 Sep 2021 08:02:07 -0700


nastra opened a new issue #3157:
URL: https://github.com/apache/iceberg/issues/3157



   `./gradlew :iceberg-spark3:jmh 
-PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark 
-PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt`
 fails after about **10 mins** with the following exception:
   
   ```
   # Benchmark mode: Single shot invocation time
   # Benchmark: 
org.apache.iceberg.spark.source.parquet.vectorized.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.readDatesIcebergVectorized5k
   
   # Run progress: 0.00% complete, ETA 00:00:00
   # Fork: 1 of 1
   # Warmup Iteration   1: WARNING: An illegal reflective access operation has 
occurred
   WARNING: Illegal reflective access by 
org.apache.hadoop.security.authentication.util.KerberosUtil 
(file:/home/nastra/Development/workspace/iceberg/spark3/build/libs/iceberg-spark3-154fe7e-jmh.jar)
 to method sun.security.krb5.Config.getInstance()
   WARNING: Please consider reporting this to the maintainers of 
org.apache.hadoop.security.authentication.util.KerberosUtil
   WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
   WARNING: All illegal access operations will be denied in a future release
   (*interrupt*) <failure>
   
   java.lang.InterruptedException
        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1040)
        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
        at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187)
        at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:335)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:766)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:382)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:361)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:253)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:259)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:54)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:962)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:962)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:353)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:287)
        at 
org.apache.iceberg.spark.source.IcebergSourceBenchmark.appendAsFile(IcebergSourceBenchmark.java:129)
        at 
org.apache.iceberg.spark.source.parquet.vectorized.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.appendData(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.java:83)
        at 
org.apache.iceberg.spark.source.parquet.vectorized.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.setupBenchmark(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark.java:57)
        at 
org.apache.iceberg.spark.source.parquet.vectorized.jmh_generated.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest._jmh_tryInit_f_vectorizedreaddictionaryencodedflatparquetdatabenchmark0_G(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest.java:438)
        at 
org.apache.iceberg.spark.source.parquet.vectorized.jmh_generated.VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest.readDatesIcebergVectorized5k_SingleShotTime(VectorizedReadDictionaryEncodedFlatParquetDataBenchmark_readDatesIcebergVectorized5k_jmhTest.java:363)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
        at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   A thread dump indicates this:
   ```
   java.lang.Thread.State: TIMED_WAITING (parking)
           at sun.misc.Unsafe.park(Native Method)
           - parking to wait for  <0x00000006c0190528> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
           at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
           at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
           at 
java.util.concurrent.ExecutorCompletionService.poll(ExecutorCompletionService.java:202)
           at 
org.openjdk.jmh.runner.BenchmarkHandler.runIteration(BenchmarkHandler.java:376)
           at 
org.openjdk.jmh.runner.BaseRunner.runBenchmark(BaseRunner.java:261)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] nastra opened a new issue #3157: VectorizedReadDictionaryEncodedFlatParquetDataBenchmark not working

Reply via email to