[GitHub] [iceberg] RussellSpitzer commented on pull request #3983: Spark: Spark3 ZOrder Rewrite Strategy

GitBox Fri, 18 Mar 2022 08:21:35 -0700


RussellSpitzer commented on pull request #3983:
URL: https://github.com/apache/iceberg/pull/3983#issuecomment-1072508346



   After changing the default size of all types to 8 bytes and randomizing the 
shuffle input I get some different perf results. It seems like the pattern of 
the data is more important to the sort times than the amount of data in the 
sort field?
   
   Comit : 91a76010869e68633d598d718fb7929f6b531a71
   ```
   Benchmark                                        Mode  Cnt    Score    Error 
 Units
   IcebergSortCompactionBenchmark.sortFourColumns     ss   10   86.675 ±  8.177 
  s/op
   IcebergSortCompactionBenchmark.sortInt             ss   10   74.262 ±  6.732 
  s/op
   IcebergSortCompactionBenchmark.sortInt2            ss   10   72.515 ±  8.014 
  s/op
   IcebergSortCompactionBenchmark.sortInt3            ss   10   76.228 ±  4.590 
  s/op
   IcebergSortCompactionBenchmark.sortInt4            ss   10   74.730 ±  4.267 
  s/op
   IcebergSortCompactionBenchmark.sortSixColumns      ss   10   75.933 ±  5.042 
  s/op
   IcebergSortCompactionBenchmark.sortString          ss   10   77.488 ±  6.804 
  s/op
   
   IcebergSortCompactionBenchmark.zSortFourColumns    ss   10  277.954 ± 73.324 
  s/op
   IcebergSortCompactionBenchmark.zSortInt            ss   10  327.105 ± 17.098 
  s/op
   IcebergSortCompactionBenchmark.zSortInt2           ss   10  328.217 ± 13.099 
  s/op
   IcebergSortCompactionBenchmark.zSortInt3           ss   10  342.404 ± 15.660 
  s/op
   IcebergSortCompactionBenchmark.zSortInt4           ss   10  344.997 ± 16.342 
  s/op
   IcebergSortCompactionBenchmark.zSortSixColumns     ss   10  295.686 ± 62.866 
  s/op
   IcebergSortCompactionBenchmark.zSortString         ss   10  333.303 ± 15.966 
  s/op
   ```
   
   What is odd to me here is that the sort time for Strings is now ... 
basically the same as integers, all of our zorderings take about the same 
amount of time and so do all of our sortings without zorder. What is more 
interesting to me is that for ZOrdering this is basically increasing the ZORDER 
output byte size and have no effect on the comparison time. For Strings maybe 
this made sense ... but for ZSortInt 1,2,3,4 I would have expected things to 
take different amounts of times. Perhaps with a totally random layout of data 
the significant bits to compare on average always appear in the same location 
for ZOrder regardless of number of interleaved columns? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on pull request #3983: Spark: Spark3 ZOrder Rewrite Strategy

Reply via email to