WillAyd opened a new issue, #41224:
URL: https://github.com/apache/arrow/issues/41224

   ### Describe the enhancement requested
   
   The performance behavior for reading binary data types through parquet is 
much different than say integral types. While of course these aren't expected 
to be identical, I was surprised to see a lot of Append calls in a performance 
trace of the parquet reader with strings.
   
   To illustrate, I have created the following data:
   
   ```python
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   tbl1 = pa.Table.from_pydict({"col": range(10_000_000)})
   pq.write_table(tbl1, "ints.parquet")
   
   tbl2 = pa.Table.from_pydict({"col": ["foo", "bar"] * 5_000_000})
   pq.write_table(tbl2, "strings.parquet")
   ```
   
   And written two simple benchmarks against these files. read_ints.py:
   
   ```
   import pyarrow.parquet as pq
   
   for _ in range(10):
       pq.read_table("ints.parquet")
   ```
   
   and read_strings.py
   
   ```python
   import pyarrow.parquet as pq
   
   for _ in range(10):
       pq.read_table("strings.parquet")
   ```
   
   When executing these under callgrind, here is what I see for the integer 
benchmark:
   
   ```
   10,978,640,541 (55.79%)  ???:std::pair<unsigned char const*, long> 
snappy::DecompressBranchless<char*>(unsigned char const*, unsigned char const*, 
long, char*, long) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libsnappy.so.1.2.0]
    2,594,207,330 (13.18%)  ???:snappy::MemCopy64(char*, void const*, unsigned 
long) [/home/willayd/mambaforge/envs/scratchpad/lib/libsnappy.so.1.2.0]
    2,395,482,666 (12.17%)  
./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 [/usr/lib/x86_64-linux-gnu/libc.so.6]
      810,937,500 ( 4.12%)  ???:parquet::internal::GreaterThanBitmapAvx2(short 
const*, long, short) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
      598,618,440 ( 3.04%)  ???:snappy::DeferMemCopy(void const**, unsigned 
long*, void const*, unsigned long) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libsnappy.so.1.2.0]
      198,977,226 ( 1.01%)  
./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:memcpy@GLIBC_2.2.5
 [/usr/lib/x86_64-linux-gnu/libc.so.6]
      177,894,713 ( 0.90%)  
/usr/local/src/conda/python-3.12.1/Modules/_sre/sre_lib.h:sre_ucs1_match 
[/home/willayd/mambaforge/envs/scratchpad/bin/python3.12]
      133,685,200 ( 0.68%)  ???:int 
arrow::util::RleDecoder::GetBatchWithDict<long>(long const*, int, long*, int) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
      117,187,500 ( 0.60%)  ???:long 
parquet::internal::standard::DefLevelsBatchToBitmap<false>(short const*, long, 
long, parquet::internal::LevelInfo, arrow::internal::FirstTimeBitmapWriter*) 
[clone .isra.0] 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
       77,081,177 ( 0.39%)  ./elf/./elf/dl-lookup.c:do_lookup_x 
[/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
   ```
   
   versus with strings
   
   ```
   7,100,000,000 (33.57%)  
???:arrow::BaseBinaryBuilder<arrow::BinaryType>::Append(unsigned char const*, 
int) [/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
   3,300,018,590 (15.60%)  ???:arrow::BufferBuilder::Append(void const*, long) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
   3,300,004,000 (15.60%)  ???:arrow::ArrayBuilder::Reserve(long) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
   2,601,226,650 (12.30%)  ???:parquet::(anonymous 
namespace)::DictByteArrayDecoderImpl::DecodeArrowDenseNonNull(int, 
parquet::EncodingTraits<parquet::PhysicalType<(parquet::Type::type)6> 
>::Accumulator*, int*) [clone .constprop.0] 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
   1,671,172,008 ( 7.90%)  
./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:memcpy@GLIBC_2.2.5
 [/usr/lib/x86_64-linux-gnu/libc.so.6]
     810,937,500 ( 3.83%)  ???:parquet::internal::GreaterThanBitmapAvx2(short 
const*, long, short) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
     200,000,204 ( 0.95%)  ???:arrow::ArrayBuilder::length() const 
[/home/willayd/mambaforge/envs/scratchpad/lib/libarrow.so.1500.2.0]
     177,894,713 ( 0.84%)  
/usr/local/src/conda/python-3.12.1/Modules/_sre/sre_lib.h:sre_ucs1_match 
[/home/willayd/mambaforge/envs/scratchpad/bin/python3.12]
     140,038,870 ( 0.66%)  ???:int 
arrow::bit_util::BitReader::GetBatch<int>(int, int*, int) 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
     117,187,500 ( 0.55%)  ???:long 
parquet::internal::standard::DefLevelsBatchToBitmap<false>(short const*, long, 
long, parquet::internal::LevelInfo, arrow::internal::FirstTimeBitmapWriter*) 
[clone .isra.0] 
[/home/willayd/mambaforge/envs/scratchpad/lib/libparquet.so.1500.2.0]
   ```
   
   Is the string reader expected to be so heavy on the Append? I am by no means 
an expert in the parquet format, but I believe that there is a 
`total_uncompressed_size` in the column metadata that might be useable to 
pre-allocate the buffer for binary data so that we don't have to spend as much 
time in Append calls
   
   The above IR is from running the benchmarks on pyarrow 15.0.2
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to