[ 
https://issues.apache.org/jira/browse/ARROW-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426726#comment-16426726
 ] 

ASF GitHub Bot commented on ARROW-2380:
---------------------------------------

pitrou commented on issue #1835: ARROW-2380: [Python] Streamline conversions
URL: https://github.com/apache/arrow/pull/1835#issuecomment-378891901
 
 
   Without this PR:
   ```
   [ 28.57%] ··· Running convert_builtins.ConvertPyListToArray.time_convert     
                                                                           ok
   [ 28.57%] ···· 
                  ==================== =============
                          type                      
                  -------------------- -------------
                         int32          5.67±0.06ms 
                         uint32         5.77±0.06ms 
                         int64           5.72±0.1ms 
                         uint64          5.09±0.1ms 
                        float32         5.16±0.01ms 
                        float64         5.32±0.05ms 
                          bool          4.51±0.02ms 
                        decimal          187±0.2ms  
                         binary          8.40±0.1ms 
                        binary10         8.35±0.1ms 
                         ascii           13.2±0.2ms 
                        unicode          28.8±0.7ms 
                       int64 list        51.0±0.6ms 
                         struct           31.6±1ms  
                   struct from tuples     31.0±2ms  
                  ==================== =============
   
   [ 42.86%] ··· Running convert_builtins.InferPyListToArray.time_infer         
                                                                           ok
   [ 42.86%] ···· 
                  ============ =============
                      type                  
                  ------------ -------------
                     int64       11.3±0.1ms 
                    float64     10.4±0.02ms 
                      bool      9.85±0.04ms 
                    decimal       383±1ms   
                     binary      14.8±0.1ms 
                     ascii       20.3±0.3ms 
                    unicode      38.2±0.4ms 
                   int64 list    102±0.3ms  
                  ============ =============
   ```
   
   With this PR:
   ```
   [ 28.57%] ··· Running convert_builtins.ConvertPyListToArray.time_convert     
                                                                            ok
   [ 28.57%] ···· 
                  ==================== =============
                          type                      
                  -------------------- -------------
                         int32          5.20±0.05ms 
                         uint32         5.03±0.06ms 
                         int64           5.69±0.2ms 
                         uint64          5.83±0.1ms 
                        float32         4.84±0.02ms 
                        float64         4.99±0.03ms 
                          bool          4.27±0.02ms 
                        decimal          179±0.7ms  
                         binary          8.61±0.1ms 
                        binary10         12.1±0.1ms 
                         ascii          10.5±0.04ms 
                        unicode          17.2±0.7ms 
                       int64 list        49.8±0.7ms 
                         struct           32.8±1ms  
                   struct from tuples     32.7±2ms  
                  ==================== =============
   
   [ 42.86%] ··· Running convert_builtins.InferPyListToArray.time_infer         
                                                                            ok
   [ 42.86%] ···· 
                  ============ =============
                      type                  
                  ------------ -------------
                     int64       11.1±0.1ms 
                    float64     10.2±0.02ms 
                      bool      9.27±0.03ms 
                    decimal      371±0.5ms  
                     binary      15.0±0.1ms 
                     ascii       17.2±0.1ms 
                    unicode      28.7±0.6ms 
                   int64 list    97.1±0.4ms 
                  ============ =============
   ```
   
   It's mostly a wash, except that convert unicode strings to Arrow became 
faster (on Python 3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Correct issues in numpy_to_arrow conversion routines
> -------------------------------------------------------------
>
>                 Key: ARROW-2380
>                 URL: https://issues.apache.org/jira/browse/ARROW-2380
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>            Reporter: Bryan Cutler
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> Following the discussion at [https://github.com/apache/arrow/pull/1689,] 
> there are a few issues with conversion of various types to arrow that are 
> incorrect or could be improved:
>  * PyBytes_GET_SIZE is being casted to the wrong type, for example 
> {{const int32_t length = static_cast<int32_t>(PyBytes_GET_SIZE(obj));}}
>  * Handle the possibility with the statement
> {{builder->value_data_length() + length > kBinaryMemoryLimit}}
> if length is larger than kBinaryMemoryLimit
>  * Look into using common code for binary object conversion to avoid 
> duplication, and allow support for bytes and bytearray objects in other 
> places than numpy_to_arrow.  (possibly put in src/arrow/python/helpers.h)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to