This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 0adf154361 GH-34897: [R] Ensure that the RStringViewer helper class 
does not own any Array references (#35812)
0adf154361 is described below

commit 0adf154361346cdef2bd4731e3880ecddba4a2c2
Author: Dewey Dunnington <[email protected]>
AuthorDate: Tue May 30 15:13:04 2023 -0400

    GH-34897: [R] Ensure that the RStringViewer helper class does not own any 
Array references (#35812)
    
    This was identified and 99% debugged by @ lgautier on 
https://github.com/rpy2/rpy2-arrow/issues/11 . Thank you!
    
    I have no idea why this does anything; however, the `RStringViewer` class 
*was* holding on to an unnecessary Array reference and this seemed to fix the 
crash for me. Maybe a circular reference? The reprex I was using (provided by @ 
lgautier) was:
    
    Install fresh deps:
    
    ```bash
    pip3 install pandas pyarrow rpy2-arrow
    R -e 'install.packages("arrow", repos = "https://cloud.r-project.org/";)'
    ```
    
    Run this python script:
    
    ```python
    import pandas as pd
    import pyarrow
    from rpy2.robjects.packages import importr
    import rpy2.robjects
    import rpy2_arrow.arrow as pyra
    base = importr('base')
    nanoarrow = importr('nanoarrow')
    
    code = """
        function(df) {
            # df$col1  # no segfault on exit
            # I(df$col1)  # no segfault on exit
            # df$col2  # no segfault on exit
            I(df$col2)  # segfault on exit
        }
    """
    rfunction = rpy2.robjects.r(code)
    
    pd_df = pd.DataFrame({
        "col1": range(10),
        "col2":["a" for num in range(10)]
    })
    pd_tbl = pyarrow.Table.from_pandas(pd_df)
    r_tbl = pyra.pyarrow_table_to_r_table(pd_tbl)
    r_df = base.as_data_frame(nanoarrow.as_nanoarrow_array_stream(r_tbl))
    
    output = rfunction(r_df)
    print(output)
    ```
    
    Before this PR (installing R/arrow from main) I get:
    
    ```
    (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py
     [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
    
    zsh: segmentation fault  python reprex-arrow.py
    ```
    
    After this PR I get:
    
    ```
    (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py
     [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
    ```
    
    (with no segfault)
    
    I wonder if this also will help with #35391 since it's also a segfault 
involving the Python <-> R bridge.
    * Closes: #34897
    
    Authored-by: Dewey Dunnington <[email protected]>
    Signed-off-by: Dewey Dunnington <[email protected]>
---
 r/src/altrep.cpp | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/r/src/altrep.cpp b/r/src/altrep.cpp
index a6b52fe9b6..ae435d54d6 100644
--- a/r/src/altrep.cpp
+++ b/r/src/altrep.cpp
@@ -743,6 +743,8 @@ struct AltrepVectorString : public 
AltrepVectorBase<AltrepVectorString<Type>> {
 
   // Helper class to convert to R strings. We declare one of these for the
   // class to avoid having to stack-allocate one for every STRING_ELT call.
+  // This class does not own a reference to any arrays: it is the caller's
+  // responsibility to ensure the Array lifetime exeeds that of the viewer.
   struct RStringViewer {
     RStringViewer() : strip_out_nuls_(false), nul_was_stripped_(false) {}
 
@@ -821,11 +823,11 @@ struct AltrepVectorString : public 
AltrepVectorBase<AltrepVectorString<Type>> {
     }
 
     void SetArray(const std::shared_ptr<Array>& array) {
-      array_ = array;
+      array_ = array.get();
       string_array_ = internal::checked_cast<const 
StringArrayType*>(array.get());
     }
 
-    std::shared_ptr<Array> array_;
+    const Array* array_;
     const StringArrayType* string_array_;
     std::string stripped_string_;
     bool strip_out_nuls_;

Reply via email to