[
https://issues.apache.org/jira/browse/ARROW-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414611#comment-17414611
]
Weston Pace commented on ARROW-13939:
-------------------------------------
> the cython documentation is single page with not much useful info.
PRs are always welcome.
> lets say i know that its of IntScalar, how to extract it int a =
> doSomethingOnCResult(val)
Scalar's have an "as_py" method. You can inspect that to see how it is working
in Cython.
> I believe this resembles to what i'm doing, having this resampling code in
> Cython. If I'm wrong please let me know.
You are not wrong, but no one else is doing this in Cython so you will need to
come up with a lot of functionality yourself and it will be a considerable
amount of work. The pyarrow philosophy has been to keep all array manipulation
in C++. The existing Cython code is pretty much limited to metadata
manipulation. The easiest path forward (in terms of man-hours of effort) is
likely to be extending Arrow-C++. Alternatively, you could investigate if
something like this is supported by datafusion. There is some initial support
for python bindings for datafusion in development. I do believe that these
kinds of functions will come to Arrow-C++ (and thus pyarrow) someday but I
can't give you any kind of estimate as there is no open JIRA ticket for them.
> I have no idea how to do this. Please can you point me to some resources.
* To access an Array's buffers in python (as a bytes object) you can do
arr.buffers()[buffer_index].to_pybytes()
* To access an Array's buffers in cython you can do something similar but the
method to call on the buffer is "data()" (for const uint8_t*) or
"mutable_data()" (for uint8_t*)
The format of these buffers is described in the [Arrow Columnar
Format](https://arrow.apache.org/docs/format/Columnar.html) and advice on how
to manipulate them is beyond the scope of a JIRA issue.
> Everything is in cython. so I pass my larger table from python to cython
> resampling function.
It sounds like your starting data is a pyarrow "Table" and so the data will be
in C++ (there are no python objects for the individual array elements). You
will probably want to use the [array
builders](https://arrow.apache.org/docs/cpp/api/builder.html) to build up your
result API but I do not believe there is any Cython API for these.
> how to do resampling of arrow table using cython
> ------------------------------------------------
>
> Key: ARROW-13939
> URL: https://issues.apache.org/jira/browse/ARROW-13939
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Python
> Reporter: krishna deepak
> Priority: Minor
>
> Please can someone point me to resources, how to write a resampling code in
> cython for Arrow table.
> # Will iterating the whole table be slow in cython?
> # which is the best to use to append new elements to. Is there a way i
> create an empty table of same schema and keep appending to it. Or should I
> use vectors/list and then pass them to create a table.
> Performance is very important for me. Any help is highly appreciated.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)