[jira] [Commented] (ARROW-13939) how to do resampling of arrow table using cython

Weston Pace (Jira) Fri, 10 Sep 2021 12:09:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413344#comment-17413344
 ]


Weston Pace commented on ARROW-13939:
-------------------------------------

Hmm, I'm not sure what documentation you are referring to.  If you are looking 
at the C++ documentation then the Cython API does not fully mirror the C++ API. 
 In other words, CResult does not have every method that Result has.  If you 
want the value from a result the proper thing to do is GetResultValue(val) 
which will check the status of the result and, if valid, returns the value.  If 
it isn't valid, it converts the invalid status into the appropriate python 
exception and raises it.

> Will iterating the whole table be slow in cython?

If you are going through every value with GetScalar then yes, it probably will 
be but I don't know for sure.  Ideally, if you want to process every value, you 
will want to get access to the raw buffers and operate on them.  Can you give 
an example of the transformation you want to do?  Your best bet might be to 
create compute kernels in C++ to do the manipulation you desire and then call 
those kernel functions from python.

> which is the best to use to append new elements to. Is there a way i create 
> an empty table of same schema and keep appending to it. Or should I use 
> vectors/list and then pass them to create a table.

Where are these elements coming from?  For example, if you are receiving them 
already in python (via some on_new_event method or something) then a simple and 
reasonably efficient approach would be to just gather them in a python list 
and, when the list is large enough, convert the list to an arrow array.  If the 
elements you are receiving are in C++ then you probably don't want to marshal 
them to python and add them to a python list.  Using the C++ array builders 
would be a better choice.

> how to do resampling of arrow table using cython
> ------------------------------------------------
>
>                 Key: ARROW-13939
>                 URL: https://issues.apache.org/jira/browse/ARROW-13939
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: krishna deepak
>            Priority: Minor
>
> Please can someone point me to resources, how to write a resampling code in 
> cython for Arrow table.
>  # Will iterating the whole table be slow in cython?
>  # which is the best to use to append new elements to. Is there a way i 
> create an empty table of same schema and keep appending to it. Or should I 
> use vectors/list and then pass them to create a table.
> Performance is very important for me. Any help is highly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13939) how to do resampling of arrow table using cython

Reply via email to