> > > > makes it more difficult to bring schema evolution back into the > > > IPC Stream format (i.e. it would live only in flight) > > > > Gosh's proposal extends the flatbuffer structures not the protobufs. Can > > you help me understand how difficult it would be to bring the `schema_id` > > approach to the IPC stream format? > > I thought we were talking solely about the Flight Protobuf definitions - > not the Flatbuffers (and the Google doc at least only talks about the > Protobufs). >
I somehow missed that schema_id is being added to protobuf in the document. It feels to me that the schema_id is a property that would ideally only apply to the RecordBatch. I better understand Micah's dictionary concerns, now, too. > Side Question: Why isn't the IPC stream format a series of the flight > > protobufs? It's a real shame that there is no standard way to > > capture/replay a stream with app_metadata. (Obviously ignoring the > > annoyances around protobuf wrapping flatbuffers.) > > The IPC format was defined long before Flight, and Flight's app_metadata > was added after Flight's initial definition. Note an IPC message does have > a provision for key-value metadata, though I think APIs for that are not > fully exposed. (See ARROW-6940: > https://issues.apache.org/jira/browse/ARROW-6940 and despite my comments > there perhaps we need to unify or at least consider how Flight's > app_metadata relates to the IPC message custom_metadata. Also perhaps see > ARROW-1059.) > KeyValue unfortunately is string to string. In flatbuffer strings are only UTF-8 or 7-bit ASCII. The app_metadata on the other hand is opaque bytes. The latter is a bit more useful. --