[ 
https://issues.apache.org/jira/browse/ARROW-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1925:
--------------------------------
    Summary: [Python] Wrapping PyArrow Table with Numpy without copy  (was: 
Wrapping PyArrow Table with Numpy without copy)

> [Python] Wrapping PyArrow Table with Numpy without copy
> -------------------------------------------------------
>
>                 Key: ARROW-1925
>                 URL: https://issues.apache.org/jira/browse/ARROW-1925
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>    Affects Versions: 0.7.1
>            Reporter: Young-Jun Ko
>            Priority: Minor
>              Labels: parquet
>
> The scenario is the following:
> I have a parquet file, which has a column containing a float array of 
> constant size.
> So it can be thought of as a matrix.
> When I read the parquet file, the way I currently access it, is to convert it 
> to pandas, extract the values, giving me a list of np.array and then doing 
> np.vstack to get the matrix.
> This involves a copy that would be nice to avoid.
> When a parquet file (or more generally a parquet dataset) is read, would the 
> values of the array column be contiguous in memory, so that a view on the 
> data could be created without having to copy? That would be neat.
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to