[
https://issues.apache.org/jira/browse/ARROW-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney updated ARROW-2330:
--------------------------------
Fix Version/s: (was: 0.9.0)
0.10.0
> [C++] Optimize delta buffer creation with partially finishable array builders
> -----------------------------------------------------------------------------
>
> Key: ARROW-2330
> URL: https://issues.apache.org/jira/browse/ARROW-2330
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Affects Versions: 0.8.0
> Reporter: Dimitri Vorona
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The main aim of this change is to optimize the building of delta
> dictionaries. In the current version delta dictionaries are built using an
> additional "overflow" buffer which leads to complicated and potentially
> error-prone code and subpar performance by doubling the number of lookups.
> I solve this problem by introducing the notion of partially finishable array
> builders, i.e. builder which are able to retain the state on calling Finish.
> The interface is based on RecordBatchBuilder::Flush, i.e. Finish is
> overloaded with additional signature Finish(bool reset_builder,
> std::shared_ptr<Array>* out). The resulting Arrays point to the same data
> buffer with different offsets.
> I'm aware that the change is kind of biggish, but I'd like to discuss it
> here. The solution makes the code more straight forward, doesn't bloat the
> code base too much and leaves the API more or less untouched. Additionally,
> the new way to make delta dictionaries by using a different call signature to
> Finish feel cleaner to me.
> I'm looking forward to your critic and improvement ideas.
> The pull request is available at: https://github.com/apache/arrow/pull/1769
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)