[
https://issues.apache.org/jira/browse/ARROW-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758057#comment-16758057
]
Zhuang Tianyi commented on ARROW-4437:
--------------------------------------
I'd like to implement it and make a contribution to arrow.
It seems that it's easy enough to export these APIs directly (just copy the
StringBuilder code for every type, Int*, Uint*, etc), but I don't know how to
use less code and DRY in cython, I'm not familar with it.
> [Python] Add builder API
> ------------------------
>
> Key: ARROW-4437
> URL: https://issues.apache.org/jira/browse/ARROW-4437
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Environment: Python 3.7.0 pyarrow-0.12.0
> Reporter: Zhuang Tianyi
> Priority: Minor
>
> There is no [Array
> Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE]
> API in python bindings. When I generate data from a stream, I have to build
> a python list (high overhead) or pandas, then finalize it by call pa.array
> with copy operation. It seems like that we can build an Array directly from
> some (two or three) pa.ResizableBuffer in O(1) time.
> It's possible that maintain these buffers (value buffer, null bitmap, offset
> buffer) manually by current exported API, but not safe enough.
>
> I found undocumented StringBuilder API in
> [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi],
> corresponding to
> [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder].
> Will other ArrayBuilder APIs to be add in python binding?
>
> ----
> Something more
> a BatchBuilder API is better if possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)