Re: Improve C++ Orc Adapter performance

Wes McKinney Thu, 28 Feb 2019 07:50:31 -0800

Thank you for opening a JIRA issue.

I have added you to the Contributor role in JIRA and assigned the issue to you


On Thu, Feb 28, 2019 at 3:13 AM 周宇睿(闻拙) <yurui....@alibaba-inc.com> wrote:
>
> Hi
>
>
>
> Currently the Arrow C++ provide a naive adapter implementation that allow 
> user to read orc file to Arrow RecordBatch. However, this implementation have 
> several drawbacks:
>
>
> Inefficient conversion that incurs huge memcpy overhead
> currently the ORC adapter are performing byte to byte memcpy to move data to 
> ORC VectorBatch to Arrow RecordBatch regardless of the fact that ORC 
> VectorBatch shares the same memory layout with Arrow in most of the Data Types
> Huge memory footprint because the lack of TableReader implementation
> The ORC adapter currently only allow user to read data with the unit of 
> stripe. However, as a columnar format with high compression ration, data read 
> from a ORC stripe can potential takes over gigabytes of memory, which makes 
> the ORC adapter not quite usable in production environment.
> Here we propose a new ORC adapter implementation to fix the issues mentioned 
> above:
> To reduce conversion overhead, instead of performing naive data copy, the new 
> adapter would be able to fully taking advantage of the memory layout 
> similarity between ORC VectorBatch and Arrow RecordBatch. Namely the new 
> adapter will perform pointer manipulation to transfer the memory ownership 
> from VectorBatch to Arrow RecordBatch whenever possible.
> The new ORC Adapter would be able to provide user a row level granularity 
> when reading data from Orc File. The user should be able to specify how many 
> rows should be expected on output RecordBatch and the ORC Adapter should make 
> sure no more the requested number of rows would be returned.
> I opened a Jira here to track the issue. Any advice would be appreciated.
>
> BTW, I tried to assigned the Jira to myself but looks like I am unable to do 
> that. Any idea how could I obtain the permission to perform the operation?
>
> Best regards
>
> Yurui
>

Re: Improve C++ Orc Adapter performance

Reply via email to