[GitHub] [iceberg] dramaticlly opened a new pull request, #5991: Spark: Fix DATE_ADD expression in IcebergSourceFlatParquetDataWriteBenchmark

GitBox Fri, 14 Oct 2022 11:39:11 -0700


dramaticlly opened a new pull request, #5991:
URL: https://github.com/apache/iceberg/pull/5991


   fix https://github.com/apache/iceberg/issues/5990
   
   
   ## Verification
   After my change, I am now seeing correct report generated
   ```
   # JMH version: 1.32
   # VM version: JDK 1.8.0_312, OpenJDK 64-Bit Server VM, 25.312-b07
   # VM invoker: 
/Users/stevezhang/workspace/jdk8/applejdk-8.0.312.7.1.jdk/Contents/Home/jre/bin/java
   # VM options: -Dfile.encoding=UTF-8 
-Djava.io.tmpdir=/Users/stevezhang/workspace/iceberg/spark/v3.3/spark/build/tmp/jmh
 -Duser.country=US -Duser.language=en -Duser.variant
   # Blackhole mode: full + dont-inline hint
   # Warmup: 3 iterations, single-shot each
   # Measurement: 5 iterations, single-shot each
   # Timeout: 10 min per iteration
   # Threads: 1 thread
   # Benchmark mode: Single shot invocation time
   # Benchmark: 
org.apache.iceberg.spark.source.parquet.IcebergSourceFlatParquetDataWriteBenchmark.writeFileSource
   
   # Run progress: 0.00% complete, ETA 00:00:00
   # Fork: 1 of 1
   # Warmup Iteration   1: 25.867 s/op
   # Warmup Iteration   2: 19.778 s/op
   # Warmup Iteration   3: 18.966 s/op
   Iteration   1: 19.017 s/op
   Iteration   2: 18.209 s/op
   Iteration   3: 19.078 s/op
   Iteration   4: 22.087 s/op
   Iteration   5: 18.014 s/op
   
   
   Result 
"org.apache.iceberg.spark.source.parquet.IcebergSourceFlatParquetDataWriteBenchmark.writeFileSource":
     N = 5
     mean =     19.281 ±(99.9%) 6.310 s/op
   
     Histogram, s/op:
       [18.000, 18.500) = 2 
       [18.500, 19.000) = 0 
       [19.000, 19.500) = 2 
       [19.500, 20.000) = 0 
       [20.000, 20.500) = 0 
       [20.500, 21.000) = 0 
       [21.000, 21.500) = 0 
       [21.500, 22.000) = 0 
       [22.000, 22.500) = 1 
   
     Percentiles, s/op:
         p(0.0000) =     18.014 s/op
        p(50.0000) =     19.017 s/op
        p(90.0000) =     22.087 s/op
        p(95.0000) =     22.087 s/op
        p(99.0000) =     22.087 s/op
        p(99.9000) =     22.087 s/op
        p(99.9900) =     22.087 s/op
        p(99.9990) =     22.087 s/op
        p(99.9999) =     22.087 s/op
       p(100.0000) =     22.087 s/op
   
   
   # JMH version: 1.32
   # VM version: JDK 1.8.0_312, OpenJDK 64-Bit Server VM, 25.312-b07
   # VM invoker: 
/Users/stevezhang/workspace/jdk8/applejdk-8.0.312.7.1.jdk/Contents/Home/jre/bin/java
   # VM options: -Dfile.encoding=UTF-8 
-Djava.io.tmpdir=/Users/stevezhang/workspace/iceberg/spark/v3.3/spark/build/tmp/jmh
 -Duser.country=US -Duser.language=en -Duser.variant
   # Blackhole mode: full + dont-inline hint
   # Warmup: 3 iterations, single-shot each
   # Measurement: 5 iterations, single-shot each
   # Timeout: 10 min per iteration
   # Threads: 1 thread
   # Benchmark mode: Single shot invocation time
   # Benchmark: 
org.apache.iceberg.spark.source.parquet.IcebergSourceFlatParquetDataWriteBenchmark.writeIceberg
   
   # Run progress: 50.00% complete, ETA 00:02:44
   # Fork: 1 of 1
   # Warmup Iteration   1: 23.999 s/op
   # Warmup Iteration   2: 19.151 s/op
   # Warmup Iteration   3: 19.056 s/op
   Iteration   1: 22.485 s/op
   Iteration   2: 19.256 s/op
   Iteration   3: 19.343 s/op
   Iteration   4: 21.488 s/op
   Iteration   5: 20.735 s/op
   
   
   Result 
"org.apache.iceberg.spark.source.parquet.IcebergSourceFlatParquetDataWriteBenchmark.writeIceberg":
     N = 5
     mean =     20.661 ±(99.9%) 5.352 s/op
   
     Histogram, s/op:
       [19.000, 19.250) = 0 
       [19.250, 19.500) = 2 
       [19.500, 19.750) = 0 
       [19.750, 20.000) = 0 
       [20.000, 20.250) = 0 
       [20.250, 20.500) = 0 
       [20.500, 20.750) = 1 
       [20.750, 21.000) = 0 
       [21.000, 21.250) = 0 
       [21.250, 21.500) = 1 
       [21.500, 21.750) = 0 
       [21.750, 22.000) = 0 
       [22.000, 22.250) = 0 
       [22.250, 22.500) = 1 
       [22.500, 22.750) = 0 
   
     Percentiles, s/op:
         p(0.0000) =     19.256 s/op
        p(50.0000) =     20.735 s/op
        p(90.0000) =     22.485 s/op
        p(95.0000) =     22.485 s/op
        p(99.0000) =     22.485 s/op
        p(99.9000) =     22.485 s/op
        p(99.9900) =     22.485 s/op
        p(99.9990) =     22.485 s/op
        p(99.9999) =     22.485 s/op
       p(100.0000) =     22.485 s/op
   
   
   # Run complete. Total time: 00:05:33
   
   REMEMBER: The numbers below are just data. To gain reusable insights, you 
need to follow up on
   why the numbers are the way they are. Use profilers (see -prof, -lprof), 
design factorial
   experiments, perform baseline and negative tests that provide experimental 
control, make sure
   the benchmarking environment is safe on JVM/OS/HW level, ask for reviews 
from the domain experts.
   Do not assume the numbers tell you what you want them to tell.
   
   Benchmark                                                   Mode  Cnt   
Score   Error  Units
   IcebergSourceFlatParquetDataWriteBenchmark.writeFileSource    ss    5  
19.281 ± 6.310   s/op
   IcebergSourceFlatParquetDataWriteBenchmark.writeIceberg       ss    5  
20.661 ± 5.352   s/op
   
   Benchmark result is saved to 
/Users/stevezhang/workspace/iceberg/spark/v3.3/spark/build/results/jmh/results.txt
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] dramaticlly opened a new pull request, #5991: Spark: Fix DATE_ADD expression in IcebergSourceFlatParquetDataWriteBenchmark

Reply via email to