[ 
https://issues.apache.org/jira/browse/ARROW-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361526#comment-16361526
 ] 

ASF GitHub Bot commented on ARROW-1998:
---------------------------------------

wesm commented on a change in pull request #1594: ARROW-1998: [Python] fix 
crash on empty Numpy arrays
URL: https://github.com/apache/arrow/pull/1594#discussion_r167702409
 
 

 ##########
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##########
 @@ -850,16 +850,23 @@ Status NumPyConverter::ConvertObjectStrings() {
   RETURN_NOT_OK(builder.Resize(length_));
 
   bool global_have_bytes = false;
-  int64_t offset = 0;
-  while (offset < length_) {
-    bool chunk_have_bytes = false;
-    RETURN_NOT_OK(
-        AppendObjectStrings(arr_, mask_, offset, &builder, &offset, 
&chunk_have_bytes));
-
-    global_have_bytes = global_have_bytes | chunk_have_bytes;
+  if (length_ == 0) {
+    // Produce an empty chunk
     std::shared_ptr<Array> chunk;
     RETURN_NOT_OK(builder.Finish(&chunk));
     out_arrays_.emplace_back(std::move(chunk));
+  } else {
+    int64_t offset = 0;
+    while (offset < length_) {
+      bool chunk_have_bytes = false;
+      RETURN_NOT_OK(
+          AppendObjectStrings(arr_, mask_, offset, &builder, &offset, 
&chunk_have_bytes));
+
+      global_have_bytes = global_have_bytes | chunk_have_bytes;
+      std::shared_ptr<Array> chunk;
+      RETURN_NOT_OK(builder.Finish(&chunk));
+      out_arrays_.emplace_back(std::move(chunk));
 
 Review comment:
   Could this be a `do ... while`? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Table.from_pandas crashes when data frame is empty
> -----------------------------------------------------------
>
>                 Key: ARROW-1998
>                 URL: https://issues.apache.org/jira/browse/ARROW-1998
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: Windows 10 Build 15063.850
> Python: 3.6.3
> Numpy: 1.14.0
> Pandas: 0.22.0
>            Reporter: Victor Jimenez
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Loading an empty CSV file, and then attempting to create a PyArrow Table from 
> it makes the application crash. The following code should be able to 
> reproduce the issue:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> FIELDS = ['id', 'name']
> NUMPY_TYPES = {
>     'id': np.int64,
>     'name': np.unicode
> }
> PYARROW_SCHEMA = pa.schema([
>     pa.field('id', pa.int64()),
>     pa.field('name', pa.string())
> ])
> file = open('input.csv', 'w')
> file.close()
> df = pd.read_csv(
>     'input.csv',
>     header=None,
>     names=FIELDS,
>     dtype=NUMPY_TYPES,
>     engine='c',
> )
> pa.Table.from_pandas(df, schema=PYARROW_SCHEMA)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to