[jira] [Commented] (ARROW-1973) [Python] Memory leak when converting Arrow tables with array columns to Pandas dataframes.

ASF GitHub Bot (JIRA) Thu, 08 Feb 2018 10:00:18 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357324#comment-16357324
 ]


ASF GitHub Bot commented on ARROW-1973:
---------------------------------------

pitrou commented on a change in pull request #1578: ARROW-1973: [Python] Memory 
leak when converting Arrow tables with array columns to Pandas dataframes.
URL: https://github.com/apache/arrow/pull/1578#discussion_r167018125
 
 

 ##########
 File path: cpp/src/arrow/python/arrow_to_pandas.cc
 ##########
 @@ -502,18 +502,20 @@ template <typename ArrowType>
 inline Status ConvertListsLike(PandasOptions options, const 
std::shared_ptr<Column>& col,
                                PyObject** out_values) {
   const ChunkedArray& data = *col->data().get();
-  auto list_type = std::static_pointer_cast<ListType>(col->type());
+  const auto& list_type = static_cast<const ListType&>(*col->type());
 
   // Get column of underlying value arrays
   std::vector<std::shared_ptr<Array>> value_arrays;
   for (int c = 0; c < data.num_chunks(); c++) {
-    auto arr = std::static_pointer_cast<ListArray>(data.chunk(c));
-    value_arrays.emplace_back(arr->values());
+    const auto& arr = static_cast<const ListArray&>(*data.chunk(c));
+    value_arrays.emplace_back(arr.values());
   }
-  auto flat_column = std::make_shared<Column>(list_type->value_field(), 
value_arrays);
+  auto flat_column = std::make_shared<Column>(list_type.value_field(), 
value_arrays);
   // TODO(ARROW-489): Currently we don't have a Python reference for single 
columns.
   //    Storing a reference to the whole Array would be to expensive.
-  PyObject* numpy_array;
+  OwnedRef owned_numpy_array;
 
 Review comment:
   This one doesn't seem used. By passing `&numpy_array` below you're not 
changing the internal pointer. Perhaps use `ref()` instead?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python] Memory leak when converting Arrow tables with array columns to 
> Pandas dataframes.
> ------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1973
>                 URL: https://issues.apache.org/jira/browse/ARROW-1973
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.8.0
>         Environment: Linux Mint 18.2
> Anaconda Python distribution + pyarrow installed from the conda-forge channel
>            Reporter: Alexey Strokach
>            Assignee: Phillip Cloud
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> There appears to be a memory leak when using PyArrow to convert tables 
> containing array columns to Pandas DataFrames.
>  See the `test_memory_leak.py` example here: 
> https://gitlab.com/ostrokach/pyarrow_duplicate_column_errors



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1973) [Python] Memory leak when converting Arrow tables with array columns to Pandas dataframes.

Reply via email to