amol- commented on a change in pull request #10101:
URL: https://github.com/apache/arrow/pull/10101#discussion_r618353079



##########
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##########
@@ -2229,6 +2229,11 @@ Status ConvertChunkedArrayToPandas(const PandasOptions& 
options,
         checked_cast<const DictionaryType&>(*arr->type()).value_type();
     RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &arr));
     DCHECK_NE(arr->type()->id(), Type::DICTIONARY);
+
+    // The original Python DictionaryArray won't own the memory anymore
+    // as we actually built a new array when we decoded the DictionaryArray
+    // thus let the final resulting numpy array own the memory through a 
Capsule
+    py_ref = nullptr;

Review comment:
       @pitrou would you be able to confirm that this is the right fix?
   
   You can see the problem it deals with by checking out this branch, 
commenting out that line and running the `test_dictionary_to_numpy` test. 
   
   You will see that in such case we produce corrupted data as the subsequent 
`MakeNumPyView` used the original `DictionaryArray` as the "base" of the new 
`numpy` array. While the newly built `Array`, resulting by dictionary decoding 
(who really owns the memory for the numpy array), probably goes away 
immediately at the end of this method. With this change instead a capsule is 
made and the returned numpy array becomes the owner of that capsule.
   
   Does that sound like the right solution?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to