Ulimo edited a comment on pull request #8694:
URL: https://github.com/apache/arrow/pull/8694#issuecomment-732091242
@eerhardt to be honest after I went through and really thought regarding
the calls, I had one final thing I was not too happy with.
The record batch stream in for get and put, I did not change it since
compared to moving files around etc this felt a tiny bit bigger.
The API in itself feels quite easy to use, but I think I overspecified it
for RecordBatch, and there will be trouble if we add dictionary support etc.
It also does not work well with DoExchange, and do exchange requires another
implementation to work good.
The solution I can think of would reduce on ease of use thought (I think?),
which would be:
* New classes, FlightData (abstract), FlightDataSchema, FlightDataRecordBatch
* Stream returns FlightData
* internal code still checks flight "correctness" (schema first message etc).
FlightData structure:
```
pubic abstract class FlightData {
public FlightDescriptor FlightDescriptor { get; }
public FlightDataType Type { get; } //Contains what type of arrow object
it was
public ByteString ApplicationMetadata { get; } //app metadata can be
changed from list in streamreader to correct single in FlightData.
public abstract T GetValue<T>();
public abstract void Accept(FlightDataVisitor visitor); // Maybe a visitor
pattern?
}
public class FlightDataRecordBatch : FlightData {
public RecordBatch RecordBatch { get; }
public GetValue<T>() {
if(typeof(T) != typeof(RecordBatch)) { throw new Exception(...); }
return RecordBatch;
}
public Accept(...) {...};
}
enum FlightDataType {
Schema = 1,
RecordBatch = 2,
...
}
```
This stream solution could also have a FlightDataVisitor for the different
types also, and I think that it allows
easier addition in the future of new arrow objects. But it of course might
be a bit harder to use.
It also follows the gRPC schema much more closely, which makes DoExchange
simple to implement.
If not using a visitor, the loop would not be as "pretty" but still
functional with easier additions of new types (atleast in my opinion):
```
while(stream.MoveNext()) {
if(stream.Current.Type == FlightDataType.RecordBatch) {
stream.Current.GetValue<RecordBatch>();
// do stuff with record batch
}
}
```
Would love to get your input, so the flight framework starts with a good
foundation.
**EDIT: Doing put operations would be a bit wierder with this though, it
would require the user to have knowledge to send the schema as the first
message. I think the putStream should still be similar to the existing solution
even with this, but with extra write commands such as Write(ArrowDictionary)
etc.**
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]