Re: [Python] receiving an arrow record batch without an attached schema

2024-04-15 Thread Aldrin
It seems like you could potentially try using pyarrow.ipc.read_message and pyarrow.ipc.read_record_batch to read individual messages from the appropriate stream type. I've never played with either function, so I can't help with their usage and details (sorry!) #

Re: [Python] receiving an arrow record batch without an attached schema

2024-04-15 Thread Aldrin
I could be wrong (there may have been many changes since I last experimented with IPC API), but in my experience this issue happens when I have mixed up IPC streaming types (feather/file vs in-memory). I believe pyarrow.ipc.new_stream and open_stream are essentially feather stream format, as

Re: [Python] receiving an arrow record batch without an attached schema

2024-04-15 Thread Kevin Liu
What is the code used to send bytes over the wire? My hunch is that there's an issue from the sending side which caused the bytes to be smaller than expected. The example in the doc constructed a writer using a provided Schema. ``` sink = pa.BufferOutputStream() with pa.ipc.new_stream(sink,

Re: [Python] receiving an arrow record batch without an attached schema

2024-04-15 Thread Amanda Weirich
When I try this I receive the following error: Expected to be able to read 824 bytes for message body, got 384 I'm assuming this is because the expected schema is missing? On Mon, Apr 15, 2024 at 1:49 PM Kevin Liu wrote: > From the example in the Streaming, Serialization, and IPC >

Re: [Python] receiving an arrow record batch without an attached schema

2024-04-15 Thread Kevin Liu
>From the example in the Streaming, Serialization, and IPC doc, it looks like you don't need to create/open a stream with a schema, the schema can be inferred from the RecordBatchStreamReader object. ``` with pa.ipc.open_stream(buf) as

[Python] receiving an arrow record batch without an attached schema

2024-04-15 Thread Amanda Weirich
Hello, I have an incoming arrow record batch without a schema attached coming in over a UDP port as buf.to_pybytes. We dont want to attach the schema because the schema is already known. So in my receive script I create my schema, and I am trying to create an arrow stream reader where I pass in