[
https://issues.apache.org/jira/browse/ARROW-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185062#comment-16185062
]
ASF GitHub Bot commented on ARROW-838:
--------------------------------------
GitHub user wesm opened a pull request:
https://github.com/apache/arrow/pull/1146
ARROW-838: [Python] Expand pyarrow.array to handle NumPy arrays not
originating in pandas
This unified the ingest path for 1D data into `pyarrow.array`. I added the
method `from_pandas` to turn null sentinel checking on or off:
```
In [8]: arr = np.random.randn(10000000)
In [9]: arr[::3] = np.nan
In [10]: arr2 = pa.array(arr)
In [11]: arr2.null_count
Out[11]: 0
In [12]: %timeit arr2 = pa.array(arr)
The slowest run took 5.43 times longer than the fastest. This could mean
that an intermediate result is being cached.
10000 loops, best of 3: 68.4 µs per loop
In [13]: arr2 = pa.array(arr, from_pandas=True)
In [14]: arr2.null_count
Out[14]: 3333334
In [15]: %timeit arr2 = pa.array(arr, from_pandas=True)
1 loop, best of 3: 228 ms per loop
```
When the data is contiguous, it is always zero-copy, but then
`from_pandas=True` and no null mask is passed, then a null bitmap is
constructed and populated.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wesm/arrow expand-py-array-method
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/arrow/pull/1146.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1146
----
commit 7b530e4b20437a82daf3c66a7d168e0be7c3bc48
Author: Wes McKinney <[email protected]>
Date: 2017-09-28T03:34:20Z
Consolidate both sequence and ndarray/Series/Index conversion in
pyarrow.Array
Change-Id: I97e785c7fd34540f2c6ba05cfaaef5b1fbf830f4
commit cf40b7678d39ddbccbc9b0e477b74ac4766eb577
Author: Wes McKinney <[email protected]>
Date: 2017-09-28T03:55:40Z
Add type aliases, some unit tests
Change-Id: I17dee43549b04a06a190baf5d0996fab4d60301f
commit 587c575aadcada9baba06431bc8db3413d29ae92
Author: Wes McKinney <[email protected]>
Date: 2017-09-28T04:20:33Z
Add direct types sequence converters for more data types
Change-Id: I484937c8eb23b96402ec6b1ec3d4342fa8dedbd4
commit f2802fc724225ed9986f10a39eff4bb4cae6ff7a
Author: Wes McKinney <[email protected]>
Date: 2017-09-28T22:12:31Z
Cleaner codepath for numpy->arrow conversions
Change-Id: I2ec4737119bf25c5f5a5ee0e760855d01daaa79b
commit 797f0151e83c14d894c436207dd1cee6a2793c6b
Author: Wes McKinney <[email protected]>
Date: 2017-09-28T22:50:48Z
Allow null checking to be skipped with from_pandas=False in pyarrow.array
Change-Id: Ie8e87c3c529f4071e221f390b333ad702d247c8d
----
> [Python] Efficient construction of arrays from non-pandas 1D NumPy arrays
> -------------------------------------------------------------------------
>
> Key: ARROW-838
> URL: https://issues.apache.org/jira/browse/ARROW-838
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is follow on work to ARROW-825
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)