This is an automated email from the ASF dual-hosted git repository.
paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 0adf154361 GH-34897: [R] Ensure that the RStringViewer helper class
does not own any Array references (#35812)
0adf154361 is described below
commit 0adf154361346cdef2bd4731e3880ecddba4a2c2
Author: Dewey Dunnington <[email protected]>
AuthorDate: Tue May 30 15:13:04 2023 -0400
GH-34897: [R] Ensure that the RStringViewer helper class does not own any
Array references (#35812)
This was identified and 99% debugged by @ lgautier on
https://github.com/rpy2/rpy2-arrow/issues/11 . Thank you!
I have no idea why this does anything; however, the `RStringViewer` class
*was* holding on to an unnecessary Array reference and this seemed to fix the
crash for me. Maybe a circular reference? The reprex I was using (provided by @
lgautier) was:
Install fresh deps:
```bash
pip3 install pandas pyarrow rpy2-arrow
R -e 'install.packages("arrow", repos = "https://cloud.r-project.org/")'
```
Run this python script:
```python
import pandas as pd
import pyarrow
from rpy2.robjects.packages import importr
import rpy2.robjects
import rpy2_arrow.arrow as pyra
base = importr('base')
nanoarrow = importr('nanoarrow')
code = """
function(df) {
# df$col1 # no segfault on exit
# I(df$col1) # no segfault on exit
# df$col2 # no segfault on exit
I(df$col2) # segfault on exit
}
"""
rfunction = rpy2.robjects.r(code)
pd_df = pd.DataFrame({
"col1": range(10),
"col2":["a" for num in range(10)]
})
pd_tbl = pyarrow.Table.from_pandas(pd_df)
r_tbl = pyra.pyarrow_table_to_r_table(pd_tbl)
r_df = base.as_data_frame(nanoarrow.as_nanoarrow_array_stream(r_tbl))
output = rfunction(r_df)
print(output)
```
Before this PR (installing R/arrow from main) I get:
```
(.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py
[1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
zsh: segmentation fault python reprex-arrow.py
```
After this PR I get:
```
(.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py
[1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
```
(with no segfault)
I wonder if this also will help with #35391 since it's also a segfault
involving the Python <-> R bridge.
* Closes: #34897
Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
---
r/src/altrep.cpp | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/r/src/altrep.cpp b/r/src/altrep.cpp
index a6b52fe9b6..ae435d54d6 100644
--- a/r/src/altrep.cpp
+++ b/r/src/altrep.cpp
@@ -743,6 +743,8 @@ struct AltrepVectorString : public
AltrepVectorBase<AltrepVectorString<Type>> {
// Helper class to convert to R strings. We declare one of these for the
// class to avoid having to stack-allocate one for every STRING_ELT call.
+ // This class does not own a reference to any arrays: it is the caller's
+ // responsibility to ensure the Array lifetime exeeds that of the viewer.
struct RStringViewer {
RStringViewer() : strip_out_nuls_(false), nul_was_stripped_(false) {}
@@ -821,11 +823,11 @@ struct AltrepVectorString : public
AltrepVectorBase<AltrepVectorString<Type>> {
}
void SetArray(const std::shared_ptr<Array>& array) {
- array_ = array;
+ array_ = array.get();
string_array_ = internal::checked_cast<const
StringArrayType*>(array.get());
}
- std::shared_ptr<Array> array_;
+ const Array* array_;
const StringArrayType* string_array_;
std::string stripped_string_;
bool strip_out_nuls_;