heiseish opened a new issue, #44160:
URL: https://github.com/apache/arrow/issues/44160

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ## Context
   
   ### Description
   - When a table built by concat-ing dictionary arrays of mismatched 
"schema"/dictionary, the transmitted table appears to be malformed
   
   ### Reproducible code
   ```python
   import pyarrow.flight as fl
   import pyarrow as pa
   import enum
   
   class MyEnum(enum.Enum):
       Foo = 0
       Bar = 1
       Baz = 2
   
   schema = pa.schema({
       'col': pa.dictionary(pa.int8(), pa.string())
   })
   
   def build_data() -> pa.Table:
       non_empty = pa.table({
           'col': pa.DictionaryArray.from_arrays(pa.array([0, 2], pa.int8()), 
[x.name for x in MyEnum])
       }, schema=schema)
       empty = pa.table({
           'col': pa.DictionaryArray.from_arrays(pa.array([], pa.int8()), [])
       }, schema=schema)
       # If unify_dictionaries get called here, it works
       return pa.concat_tables([empty, non_empty]) # .unify_dictionaries()
   
   class Server(fl.FlightServerBase):
       def do_get(self, context, ticket):
           table = build_data()
           _ = table['col'].to_pylist()
           print('build table ', table)
           # This doesn't work
           return fl.RecordBatchStream(table, 
options=pa.ipc.IpcWriteOptions(unify_dictionaries=True))
   
   if __name__ == '__main__':
       server = Server()
       client = fl.FlightClient(f'grpc://localhost:{server.port}')
       client.wait_for_available()
       table = client.do_get(fl.Ticket(bytes())).read_all()
       try:
           _ = table['col'].to_pylist()
           print('got table ', table)
       except Exception as e:
           print(e)
       server.shutdown()
   ```
   
   ### Expectation
   - `to_pylist` succeeds
   
   ### Actual
   - `to_pylist` fails with `index with value of 0 is out-of-bounds for array 
of length 0`
   
   ### Observation
   Table before IPC
   ```
   ----
   col: [  -- dictionary:
   []  -- indices:
   [],  -- dictionary:
   ["Foo","Bar","Baz"]  -- indices:
   [0,2]]
   ```
   
   Table after IPC
   ```
   col: [  -- dictionary:
   []  -- indices:
   [],  -- dictionary:
   []  -- indices:
   [0,2]]
   ```
   
   
   I'm happy to open a PR if someone can point me to the relevant code. Thanks!
   
   
   ### Component(s)
   
   FlightRPC, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to