[ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1964:
-------------------------------
    Description: 
Having the builder classes available from Python would be very helpful. 
Currently a construction of an Arrow array always need to have a Python list or 
numpy array as intermediate. As  the builder in combination with jemalloc are 
very efficient in building up non-chunked memory, it would be nice to directly 
use them in certain cases.

The most useful builders are the 
[StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
 and 
[DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
 as they provide functionality to create columns that are not easily 
constructed using NumPy methods in Python.

The basic approach would be to wrap the C++ classes in 
https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
 so that they can be used from Cython. Afterwards, we should start a new file 
{{python/pyarrow/builder.pxi}} where we have classes take typical Python 
objects like {{str}} and pass them on to the C++ classes. At the end, these 
classes should also return (Python accessible) {{pyarrow.Array}} instances.

  was:Having the builder classes available from Python would be very helpful. 
Currently a construction of an Arrow array always need to have a Python list or 
numpy array as intermediate. As  the builder in combination with jemalloc are 
very efficient in building up non-chunked memory, it would be nice to directly 
use them in certain cases.


> [Python] Expose Builder classes
> -------------------------------
>
>                 Key: ARROW-1964
>                 URL: https://issues.apache.org/jira/browse/ARROW-1964
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>              Labels: beginner
>             Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.
> The most useful builders are the 
> [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
>  and 
> [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
>  as they provide functionality to create columns that are not easily 
> constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in 
> https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
>  so that they can be used from Cython. Afterwards, we should start a new file 
> {{python/pyarrow/builder.pxi}} where we have classes take typical Python 
> objects like {{str}} and pass them on to the C++ classes. At the end, these 
> classes should also return (Python accessible) {{pyarrow.Array}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to