[
https://issues.apache.org/jira/browse/ARROW-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758028#comment-16758028
]
Uwe L. Korn commented on ARROW-4437:
------------------------------------
[~TennyZhuang] Yes, it would be nice to have the builder APIs exposed in
Python. This is a really good beginner task. Would you like to have some
guidance on how to expose them?
> [Python] Add builder API
> ------------------------
>
> Key: ARROW-4437
> URL: https://issues.apache.org/jira/browse/ARROW-4437
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Environment: Python 3.7.0 pyarrow-0.12.0
> Reporter: Zhuang Tianyi
> Priority: Minor
>
> There is no [Array
> Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE]
> API in python bindings. When I generate data from a stream, I have to build
> a python list (high overhead) or pandas, then finalize it by call pa.array
> with copy operation. It seems like that we can build an Array directly from
> some (two or three) pa.ResizableBuffer in O(1) time.
> It's possible that maintain these buffers (value buffer, null bitmap, offset
> buffer) manually by current exported API, but not safe enough.
>
> I found undocumented StringBuilder API in
> [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi],
> corresponding to
> [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder].
> Will other ArrayBuilder APIs to be add in python binding?
>
> ----
> Something more
> a BatchBuilder API is better if possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)