[jira] [Commented] (ARROW-838) [Python] Efficient construction of arrays from non-pandas 1D NumPy arrays

ASF GitHub Bot (JIRA) Thu, 28 Sep 2017 15:57:36 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185062#comment-16185062
 ]


ASF GitHub Bot commented on ARROW-838:
--------------------------------------

GitHub user wesm opened a pull request:

    https://github.com/apache/arrow/pull/1146

    ARROW-838: [Python] Expand pyarrow.array to handle NumPy arrays not 
originating in pandas

    This unified the ingest path for 1D data into `pyarrow.array`. I added the 
method `from_pandas` to turn null sentinel checking on or off:
    
    ```
    In [8]: arr = np.random.randn(10000000)
    
    In [9]: arr[::3] = np.nan
    
    In [10]: arr2 = pa.array(arr)
    
    In [11]: arr2.null_count
    Out[11]: 0
    
    In [12]: %timeit arr2 = pa.array(arr)
    The slowest run took 5.43 times longer than the fastest. This could mean 
that an intermediate result is being cached.
    10000 loops, best of 3: 68.4 µs per loop
    
    In [13]: arr2 = pa.array(arr, from_pandas=True)
    
    In [14]: arr2.null_count
    Out[14]: 3333334
    
    In [15]: %timeit arr2 = pa.array(arr, from_pandas=True)
    1 loop, best of 3: 228 ms per loop
    ```
    
    When the data is contiguous, it is always zero-copy, but then 
`from_pandas=True` and no null mask is passed, then a null bitmap is 
constructed and populated.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wesm/arrow expand-py-array-method

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/arrow/pull/1146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1146
    
----
commit 7b530e4b20437a82daf3c66a7d168e0be7c3bc48
Author: Wes McKinney <[email protected]>
Date:   2017-09-28T03:34:20Z

    Consolidate both sequence and ndarray/Series/Index conversion in 
pyarrow.Array
    
    Change-Id: I97e785c7fd34540f2c6ba05cfaaef5b1fbf830f4

commit cf40b7678d39ddbccbc9b0e477b74ac4766eb577
Author: Wes McKinney <[email protected]>
Date:   2017-09-28T03:55:40Z

    Add type aliases, some unit tests
    
    Change-Id: I17dee43549b04a06a190baf5d0996fab4d60301f

commit 587c575aadcada9baba06431bc8db3413d29ae92
Author: Wes McKinney <[email protected]>
Date:   2017-09-28T04:20:33Z

    Add direct types sequence converters for more data types
    
    Change-Id: I484937c8eb23b96402ec6b1ec3d4342fa8dedbd4

commit f2802fc724225ed9986f10a39eff4bb4cae6ff7a
Author: Wes McKinney <[email protected]>
Date:   2017-09-28T22:12:31Z

    Cleaner codepath for numpy->arrow conversions
    
    Change-Id: I2ec4737119bf25c5f5a5ee0e760855d01daaa79b

commit 797f0151e83c14d894c436207dd1cee6a2793c6b
Author: Wes McKinney <[email protected]>
Date:   2017-09-28T22:50:48Z

    Allow null checking to be skipped with from_pandas=False in pyarrow.array
    
    Change-Id: Ie8e87c3c529f4071e221f390b333ad702d247c8d

----


> [Python] Efficient construction of arrays from non-pandas 1D NumPy arrays
> -------------------------------------------------------------------------
>
>                 Key: ARROW-838
>                 URL: https://issues.apache.org/jira/browse/ARROW-838
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> This is follow on work to ARROW-825



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-838) [Python] Efficient construction of arrays from non-pandas 1D NumPy arrays

Reply via email to