Ulimo edited a comment on pull request #8694:
URL: https://github.com/apache/arrow/pull/8694#issuecomment-732091242


   @eerhardt  to be honest after I went through and really thought regarding 
the calls, I had one final thing I was not too happy with.
   The record batch stream in for get and put, I did not change it since 
compared to moving files around etc this felt a tiny bit bigger.
   
   The API in itself feels quite easy to use, but I think I overspecified it 
for RecordBatch, and there will be trouble if we add dictionary support etc.
   It also does not work well with DoExchange, and do exchange requires another 
implementation to work good.
   
   The solution I can think of would reduce on ease of use thought (I think?), 
which would be:
   
   * New classes, FlightData (abstract), FlightDataSchema, FlightDataRecordBatch
   * Stream returns FlightData
   * internal code still checks flight "correctness" (schema first message etc).
   
   FlightData structure:
   
   ```
   pubic abstract class FlightData {
     public FlightDescriptor FlightDescriptor { get; }
     public FlightDataType Type { get; } //Contains what type of arrow object 
it was
     public ByteString ApplicationMetadata { get; } //app metadata can be 
changed from list in streamreader to correct single in FlightData.
     public abstract T GetValue<T>();
     public abstract void Accept(FlightDataVisitor visitor); // Maybe a visitor 
pattern?
   }
   
   public class FlightDataRecordBatch : FlightData {
     public RecordBatch RecordBatch { get; }
     public GetValue<T>() {
       if(typeof(T) != typeof(RecordBatch)) { throw new Exception(...); }
       return RecordBatch;
     }
   
     public Accept(...) {...};
   }
   
   enum FlightDataType {
     Schema = 1,
     RecordBatch = 2,
     ...
   }
   ```
   
   This stream solution could also have a FlightDataVisitor for the different 
types also, and I think that it allows
   easier addition in the future of new arrow objects. But it of course might 
be a bit harder to use.
   It also follows the gRPC schema much more closely, which makes DoExchange 
simple to implement.
   
   If not using a visitor, the loop would not be as "pretty" but still 
functional with easier additions of new types (atleast in my opinion):
   
   ```
   while(stream.MoveNext()) {
     if(stream.Current.Type == FlightDataType.RecordBatch) {
       stream.Current.GetValue<RecordBatch>();
       // do stuff with record batch
     }
   }
   ```
   
   Would love to get your input, so the flight framework starts with a good 
foundation.
   
   **EDIT: Doing put operations would be a bit wierder with this though, it 
would require the user to have knowledge to send the schema as the first 
message. I think the putStream should still be similar to the existing solution 
even with this, but with extra write commands such as Write(ArrowDictionary) 
etc.**


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to