[jira] [Commented] (ARROW-13939) how to do resampling of arrow table using cython

krishna deepak (Jira) Fri, 10 Sep 2021 23:16:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413477#comment-17413477
 ]


krishna deepak commented on ARROW-13939:
----------------------------------------

[~westonpace] 

Thanks this is very helpful,

Regarding documentation, it makes sense. But then, the cython documentation is 
single page with not much useful info. The function
{code:java}
 GetResultValue(val){code}
is no where to be found.
I'm still stuck after using this. It outputs a 'shared_ptr[CScalar]' and 
therefore 'CScalar *'. But still stuck with extracting value out of it. 
lets say i know that its of IntScalar, how to extract it int a = 
doSomethingOnCResult(val)

---------------------------------------------------------------------------------------------------------------------------------------------------------------
What im trying to do is converting data from [[11:01, 3], [11:02,4], 
[11:03:,2], [11:04,1], [11:05, 3], [11:06,6]] to [[11:03:3], [11:06:6]], just 
resampling 1 min data to 3min data. Here the transformation function was max of 
all values
So I have to iterate through all values.
> Your best bet might be to create compute kernels in C++ to do the 
> manipulation you desire and then call those kernel functions from python.
I believe this resembles to what i'm doing, having this resampling code in 
Cython. If I'm wrong please let me know.

> if you want to process every value, you will want to get access to the raw 
> buffers and operate on them.
I have no idea how to do this. Please can you point me to some resources.

---------------------------------------------------------------------------------------------------------------------------------------------------------------
> Where are these elements coming from?
Everything is in cython. so I pass my larger table from python to cython 
resampling function. This function iterates over the whole table and builds a 
new table as it iterates. 
My plan is to use cpp vector to build individual columns and pass it to Arrow 
Table constructor and then return back to python .



> how to do resampling of arrow table using cython
> ------------------------------------------------
>
>                 Key: ARROW-13939
>                 URL: https://issues.apache.org/jira/browse/ARROW-13939
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: krishna deepak
>            Priority: Minor
>
> Please can someone point me to resources, how to write a resampling code in 
> cython for Arrow table.
>  # Will iterating the whole table be slow in cython?
>  # which is the best to use to append new elements to. Is there a way i 
> create an empty table of same schema and keep appending to it. Or should I 
> use vectors/list and then pass them to create a table.
> Performance is very important for me. Any help is highly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13939) how to do resampling of arrow table using cython

Reply via email to