[ 
https://issues.apache.org/jira/browse/ARROW-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727509#comment-16727509
 ] 

Tanya Schlusser commented on ARROW-4050:
----------------------------------------

Hi [~cav71], maybe I can be useful.

You're right that arrow (cpp) builds a library called {{libarrow_python}} which 
exposes the parts of arrow that the Python library will use. That is the first 
step, run with {{cmake}} inside the directory {{arrow/cpp/build}}.

But to make the Python library there must also be a second step, run inside 
{{arrow/python}}:
The pyarrow library uses Cython (I am learning Cython -- this [rectangle 
example|https://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html]
 was helpful) wraps all of these exposed objects in Python for the end user.

h6. details / example:
The 
[pyarrow.__init__.py|https://github.com/apache/arrow/blob/master/python/pyarrow/__init__.py]
 imports a ton of stuff from {{pyarrow.lib}}. But there is no 
{{pyarrow/lib.py}} file in the source code. Instead, there are
* {{pyarrow/lib.pxd}} (corresponds to a c++ header file)
*  {{pyarrow/lib.pyx}} (corresponds to a c++ source file)

which must be compiled using Cython. The {{setup.py build_ext --inplace}} uses 
Cython to
# auto- generate C++ code ({{pyarrow/lib.cpp}}, {{pyarrow/lib_api.h}})
# compile it to a shared object  (on my laptop, 
{{pyarrow/lib.cpython-36m-darwin.so}})

That shared object is the {{pyarrow.lib}} imported in {{pyarrow/__init__.py}}.
I hope it is useful!

P.S. The [script linked 
above|https://issues.apache.org/jira/secure/attachment/12952061/working_python37_build_on_osx.sh]
 successfully built the code on my laptop

> core dump on reading parquet file
> ---------------------------------
>
>                 Key: ARROW-4050
>                 URL: https://issues.apache.org/jira/browse/ARROW-4050
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>            Reporter: Antonio Cavallo
>            Priority: Blocker
>              Labels: pull-request-available
>         Attachments: bug.parquet, working_python37_build_on_osx.sh
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi,
> I've a crash when doing this:
> {{import pyarrow.parquet as pq}}
> {{pq.read_table('bug.parquet')}}
> [^bug.parquet]
> (this is the same generated by 
> arrow/python/pyarrow/tests/test_parquet.py(112)test_single_pylist_column_roundtrip())



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to