[GitHub] [arrow] wesm edited a comment on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

GitBox Wed, 03 Jun 2020 17:49:25 -0700


wesm edited a comment on pull request #7143:
URL: https://github.com/apache/arrow/pull/7143#issuecomment-638534173



   OK I adapted the benchmark here to use the `BitmapScanner` from #7346 
   
   https://github.com/wesm/arrow/tree/bit-runner
   
   ```
   ------------------------------------------------------------------
   Benchmark                           Time           CPU Iterations
   ------------------------------------------------------------------
   BitRunReader/-1                  9890 ns       9890 ns      70683   
49.3705MB/s
   BitRunReader/0                    108 ns        108 ns    6250693     
4.423GB/s
   BitRunReader/10                  2101 ns       2101 ns     334686    
232.36MB/s
   BitRunReader/25                  4072 ns       4072 ns     173114   
119.915MB/s
   BitRunReader/50                  5221 ns       5221 ns     133040   
93.5178MB/s
   BitRunReader/60                  5042 ns       5042 ns     138099   
96.8386MB/s
   BitRunReader/75                  3933 ns       3933 ns     179857   
124.152MB/s
   BitRunReader/99                   291 ns        291 ns    2412105    
1.6376GB/s
   BitRunReaderWithScanner/-1         47 ns         47 ns   15059881   
10.2331GB/s
   BitRunReaderWithScanner/0          46 ns         46 ns   15078363   
10.2704GB/s
   BitRunReaderWithScanner/10         47 ns         47 ns   15118172   
10.2299GB/s
   BitRunReaderWithScanner/25         47 ns         47 ns   15033144   
10.2528GB/s
   BitRunReaderWithScanner/50         47 ns         47 ns   14947443   
10.1964GB/s
   BitRunReaderWithScanner/60         47 ns         47 ns   14668505   
10.0837GB/s
   BitRunReaderWithScanner/75         46 ns         46 ns   15045334   
10.2918GB/s
   BitRunReaderWithScanner/99         46 ns         46 ns   14961067   
10.2813GB/s
   BitRunReaderScalar/-1           13089 ns      13088 ns      51449   
37.3063MB/s
   BitRunReaderScalar/0             3844 ns       3844 ns     176221   
127.024MB/s
   BitRunReaderScalar/10            6621 ns       6621 ns     104648   
73.7517MB/s
   BitRunReaderScalar/25           12397 ns      12397 ns      55998    
39.388MB/s
   BitRunReaderScalar/50           17099 ns      17099 ns      41378    
28.556MB/s
   BitRunReaderScalar/60           16606 ns      16606 ns      42580   
29.4046MB/s
   BitRunReaderScalar/75           11431 ns      11431 ns      61744   
42.7165MB/s
   BitRunReaderScalar/99            4265 ns       4265 ns     167402   
114.484MB/s
   ```
   
   This isn't apples-to-applies at all because the scanner just popcounts, it 
doesn't segment null- from non-null runs. If the goal is to accelerate the 
writing of mostly-non-null data, is it worth going to all this trouble of 
exactly delimiting the start and end point of each run?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wesm edited a comment on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

Reply via email to