Hattonuri opened a new issue, #39398:
URL: https://github.com/apache/arrow/issues/39398

   ### Describe the enhancement requested
   
   I've found that for-loop here
   
https://github.com/apache/arrow/blob/7c3480e2f028f5881242f227f42155cf833efee7/cpp/src/parquet/column_reader.cc#L1055-L1073
   transforms into
   
   0xc0c2f0 <ReadLevels()+96>      inc    %rdx
   0xc0c2f3 <ReadLevels()+99>      cmp    %rax,%rdx
   0xc0c2f6 <ReadLevels()+102>     jge    0xc0c30c <ReadLevels()+124>
   0xc0c2f8 <ReadLevels()+104>     cmp    %cx,(%r14,%rdx,2)
   0xc0c2fd <ReadLevels()+109>     jne    0xc0c2f0 <ReadLevels()+96>
   0xc0c2ff <ReadLevels()+111>     incq   0x0(%rbp)                             
                      
   0xc0c303 <ReadLevels()+115>     mov    (%rbx),%rax
   0xc0c306 <ReadLevels()+118>     jmp    0xc0c2f0 <ReadLevels()+96>
   
   That means that it uses iteration element by element and changes reference 
with incq
   I think that the reason is that values_to_read and num_def_levels are not 
set as restrict. So the compiler can not optimize this to a more efficient 
way(for example using simd)
   
   On my flamegraph this part showed ~10% of time spent
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to