Hattonuri opened a new issue, #39398: URL: https://github.com/apache/arrow/issues/39398
### Describe the enhancement requested I've found that for-loop here https://github.com/apache/arrow/blob/7c3480e2f028f5881242f227f42155cf833efee7/cpp/src/parquet/column_reader.cc#L1055-L1073 transforms into 0xc0c2f0 <ReadLevels()+96> inc %rdx 0xc0c2f3 <ReadLevels()+99> cmp %rax,%rdx 0xc0c2f6 <ReadLevels()+102> jge 0xc0c30c <ReadLevels()+124> 0xc0c2f8 <ReadLevels()+104> cmp %cx,(%r14,%rdx,2) 0xc0c2fd <ReadLevels()+109> jne 0xc0c2f0 <ReadLevels()+96> 0xc0c2ff <ReadLevels()+111> incq 0x0(%rbp) 0xc0c303 <ReadLevels()+115> mov (%rbx),%rax 0xc0c306 <ReadLevels()+118> jmp 0xc0c2f0 <ReadLevels()+96> That means that it uses iteration element by element and changes reference with incq I think that the reason is that values_to_read and num_def_levels are not set as restrict. So the compiler can not optimize this to a more efficient way(for example using simd) On my flamegraph this part showed ~10% of time spent ### Component(s) C++, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
