romainfrancois commented on pull request #8365:
URL: https://github.com/apache/arrow/pull/8365#issuecomment-717996007


   based on @bkietz comment from https://github.com/apache/arrow/pull/8536 
Converter_String::Ingest_some_nulls() could end with: 
   
   ```cpp
   cpp11::unwind_protect([&] {
         if (array->null_count()) {
           // need to watch for nulls
           arrow::internal::BitmapReader null_reader(array->null_bitmap_data(),
                                                     array->offset(), n);
           for (int i = 0; i < n; i++, null_reader.Next()) {
             if (null_reader.IsSet()) {
               SET_STRING_ELT(data, start + i, 
unsafe_r_string(string_array->GetString(i)));
             } else {
               SET_STRING_ELT(data, start + i, NA_STRING);
             }
           }
   
         } else {
           for (int i = 0; i < n; i++) {
             SET_STRING_ELT(data, start + i, 
unsafe_r_string(string_array->GetString(i)));
           }
         }
       });
   ```
   
   this builds on knowing that `GetString()` gives an utf8 string, and that we 
know its size, as opposed to what `cpp11::r_string(std::string)` does here: 
https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/r_string.hpp#L18 
and uses unwind protection around the loop instead of the individual 
conversions. 
   
   i.e. it assumes utf-8 but since it does not use the known size, it searches 
for `nul`: r_string(const std::string& data) : 
data_(safe[Rf_mkCharCE](data.c_str(), CE_UTF8)) {}
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to