[ 
https://issues.apache.org/jira/browse/ORC-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated ORC-1137:
--------------------------------
    Description: 
DoubleColumnReader::next() takes a portion of time when reading doubles from a 
tpch-lineitem ORC file.

!DoubleColumnReader.png|width=783,height=203!

I can see the loop in readDouble is unrolled. But it still have redundant 
checks. We can manually unroll it to save some instructions.

!DoubleColumnReader_readDouble.png|width=598,height=653!

Furthermode, in little-endian machines, the layout of the DATA stream of the 
DOUBLE column matches the memory layout of the output array. We can use 
std::memcpy to copy the data directly.

  was:
DoubleColumnReader::next() takes a portion of time when reading doubles from a 
tpch-lineitem ORC file. I can see the loop in readDouble is unrolled. But it 
still have redundant checks. We can manually unroll it to save some 
instructions.

Furthermode, in little-endian machines, the layout of the DATA stream of the 
DOUBLE column matches the memory layout of the output array. We can use 
std::memcpy to copy the data directly.

 


> [C++] Improve float/double conversion in DoubleColumnReader::next()
> -------------------------------------------------------------------
>
>                 Key: ORC-1137
>                 URL: https://issues.apache.org/jira/browse/ORC-1137
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: DoubleColumnReader.png, DoubleColumnReader_readDouble.png
>
>
> DoubleColumnReader::next() takes a portion of time when reading doubles from 
> a tpch-lineitem ORC file.
> !DoubleColumnReader.png|width=783,height=203!
> I can see the loop in readDouble is unrolled. But it still have redundant 
> checks. We can manually unroll it to save some instructions.
> !DoubleColumnReader_readDouble.png|width=598,height=653!
> Furthermode, in little-endian machines, the layout of the DATA stream of the 
> DOUBLE column matches the memory layout of the output array. We can use 
> std::memcpy to copy the data directly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to