jorisvandenbossche opened a new issue, #39637:
URL: https://github.com/apache/arrow/issues/39637

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Not a C++ reproducer, but with very preliminary (non-merged) Python bindings 
to illustrate it:
   
   ```
   import pyarrow as pa
   import pyarrow.dataset as ds
   
   # create IPC file with StringView column
   builder = pa.lib.StringViewBuilder()
   builder.append("test")
   builder.append("some long string that is not inlined")
   builder.append(None)
   arr = builder.finish()
   
   table = pa.table({"a": [1, 2, 3], "b": arr})
   ds.write_dataset(table, "test_string_view", format="ipc")
   
   # read back as a dataset
   dataset = ds.dataset("test_string_view/", format="ipc")
   ```
   
   The dataset looks good, and can be read as a whole:
   
   ```
   >>> dataset.schema
   a: int64
   b: string_view
   
   >>> dataset.to_table()
   pyarrow.Table
   a: int64
   b: string_view
   ----
   a: [[1,2,3]]
   b: [["test","some long string that is not inlined",null]]
   ```
   
   But when reading it with a filter on the other integer column, it silently 
gives an empty table (I would have expected it to raise an error because 
filter/take is not yet implemented for string_view):
   
   ```
   >>> dataset.to_table(filter=ds.field('a') > 1)
   pyarrow.Table
   a: int64
   b: string_view
   ----
   a: [[]]
   b: [[]]
   ```
   
   This doesn't seem to happen if the string array only has inlined strings 
(then I get the expected error).
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to