jorisvandenbossche opened a new issue, #39635:
URL: https://github.com/apache/arrow/issues/39635

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Not a C++ reproducer, but with very preliminary (non-merged) Python bindings 
to illustrate it:
   
   ```
   builder = pa.lib.StringViewBuilder()
   builder.append("test")
   builder.append("very long string that is not inlined")
   builder.append(None)
   builder.append("test")
   
   >>> arr = builder.finish()
   >>> arr
   <pyarrow.lib.Array object at 0x7f9a2e1fc4c0>
   [
     "test",
     "very long string that is not inlined",
     null,
     "test"
   ]
   >>> arr.type
   DataType(string_view)
   ```
   
   Calculating the `unique` values of this array includes the missing value as 
an empty string:
   
   ```
   >>> arr.unique()
   <pyarrow.lib.Array object at 0x7f9a2e45fe20>
   [
     "test",
     "very long string that is not inlined",
     ""
   ]
   ```
   
   I didn't check in the code, but I _assume_ that it's "just" missing the 
validity bitmap (the empty string being the value that would otherwise be 
masked).
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to