[
https://issues.apache.org/jira/browse/ARROW-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-5862:
----------------------------------
Labels: pull-request-available (was: )
> [Java] Provide dictionary builder
> ---------------------------------
>
> Key: ARROW-5862
> URL: https://issues.apache.org/jira/browse/ARROW-5862
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Java
> Reporter: Liya Fan
> Assignee: Liya Fan
> Priority: Major
> Labels: pull-request-available
>
> The dictionary builder servers for the following scenario which is frequently
> encountered in practice when dictionary encoding is involved: the dictionary
> values are not known a priori, so they are determined dynamically, as new
> data arrive continually.
> In particular, when a new value arrives, it is tested to check if it is
> already in the dictionary. If so, it is simply neglected, otherwise, it is
> added to the dictionary.
>
> When all values have been evaluated, the dictionary can be considered
> complete. So encoding can start afterward.
> The code snippet using a dictionary builder should be like this:
> {{DictonaryBuilder<IntVector> dictionaryBuilder = ...}}
> {{dictionaryBuilder.startBuild();}}
> {{...}}
> {{dictionaryBuild.addValue(newValue);}}
> {{...}}
> {{dictionaryBuilder.endBuild();}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)