[
https://issues.apache.org/jira/browse/ARROW-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney updated ARROW-1925:
--------------------------------
Summary: [Python] Wrapping PyArrow Table with Numpy without copy (was:
Wrapping PyArrow Table with Numpy without copy)
> [Python] Wrapping PyArrow Table with Numpy without copy
> -------------------------------------------------------
>
> Key: ARROW-1925
> URL: https://issues.apache.org/jira/browse/ARROW-1925
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Young-Jun Ko
> Priority: Minor
> Labels: parquet
>
> The scenario is the following:
> I have a parquet file, which has a column containing a float array of
> constant size.
> So it can be thought of as a matrix.
> When I read the parquet file, the way I currently access it, is to convert it
> to pandas, extract the values, giving me a list of np.array and then doing
> np.vstack to get the matrix.
> This involves a copy that would be nice to avoid.
> When a parquet file (or more generally a parquet dataset) is read, would the
> values of the array column be contiguous in memory, so that a view on the
> data could be created without having to copy? That would be neat.
> Thanks!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)