Ulimo edited a comment on pull request #8694:
URL: https://github.com/apache/arrow/pull/8694#issuecomment-731658726


   @eerhardt first of all thank you so much for your support and feedback and 
sorry for this wall of text. Based on your last feedback that the API is hard 
to change etc in the future, I really took a long sitting to go through all the 
classes etc, and found some issues that I really needed to address.
   
   This resulted in a small overhaul on the structure etc. I also went through 
so the gRPC schema information is actually exposed to the user as well. This 
also resulted in some rewrites. The feedback you have given has been exactly 
what I needed to come to the conclusions etc on structure, so I am once again 
extremely thankful for all the feedback you have given.
   
   This ofcourse increased the scope of the PR a bit again, but I hope it helps 
give a more real look and feel on how the framework will behave, and if the 
implemented design actually works.
   
   Here comes information on the work that has been done, and also some 
checklists to see that all the required information is exposed etc, and why 
some classes are public.
   
   # Major changes
   
   * All exposed classes start with prefix "Flight".
   * Generated code is now internal.
   * New project Apache.Arrow.Flight.AspNetCore, required since generated code 
is internal, this project references grpc.aspnetcore.server which is 
netcoreapp3.1 or net5.0 only, and contains extensions only to add flight 
support for aspnetcore projects.
   * All flight client methods now follows gRPC API more closely, allows 
reading response headers for all calls.
   * Exposing application metadata required rewriting a bit of the logic in 
client and servers, read more in its section.
   * Moved client specific classes under .Client namespace.
   * Moved server specific classes under .Server namespace.
   * Internal classes are now under the .Internal namespace.
   * Only mutual used classes are now in the .Flight namespace.
   
   # Minor changes
   
   * Removed multi property exposure of bytestring, to only bytestring.
   * Exposed TotalBytes and TotalRecords in FlightInfo.
   * Moved public abstract reader, writer classes to internal namespace since 
they cant be used by the user.
   * Added FlightServerRecordBatchStreamReader to expose FlightDescriptor only 
to the server since the client wont use it.
   * Added FlightClientRecordBatchStreamReader which just extends 
FlightRecordBatchStreamReader which is now abstract.
   * StreamWriter made internal
   * PutResult does not expose ArrowBuffer (was taken from java project), 
instead it exposes bytestring like the other classes. 
     This was done so the user can choose what type of metadata to return, and 
also follow more closely the other classes.
   * Added public static Empty on PutResult, can be useful for implementations 
when metadata is not used, and empty put result should be returned.
   * Added a new test case to get metadata.
   * Added a new test case to put metadata.
   * Added a new test case for get schema.
   * Added a new test case for DoAction.
   * Added a new test case for ListFlights.
   
   Test coverage is now above 80% on the non generated code.
   
   # Application metadata addition
   
   To enable users to write and read application metadata, the class 
FlightRecordBatchStreamingCall was added that
   still follow gRPC look and feel, but exposes FlightRecordBatchStreamReader 
as response stream instead where the user can
   get application metadata.
   
   Application metadata can be read with each record batch, it is implemented 
as a list since hypothetically metadata can
   be sent with the schema aswell. Since the packages sent follow this pattern 
at the moment:
   
   Schema -> record batch -> record batch -> ... -> done
   
   So the first message could contains metadata as well. When a user gets a new 
record batch the metadata is cleared similar
   to how the java implementation is done.
   
   For writing, this is done by an additional method in 
FlightRecordBatchStreamWriter which looks like this:
   ```
   public Task WriteAsync(RecordBatch message, ByteString applicationMetadata);
   ```
   # gRPC support
   Here is a check on what gRPC methods and properties are exposed and can be 
read/used by a user of the framework, mostly to check that everything that is 
required to be exposed are exposed at this time.
   
   
   ## gRPC methods exposed or not exposed to the user
   * ListFlights - Exposed
   * GetFlightInfo - Exposed
   * GetSchema - Exposed
   * DoGet - Exposed
   * DoPut - Exposed
   * DoAction - Exposed
   * ListActions -Exposed
   * **Handshake - Not exposed**
   * **DoExchange - Not exposed**
   
   ## gRPC properties exposed or not exposed to the user
   
   ### HandshakeRequest
   **Handshake is not exposed at all**
   
   * **protocol_version - not exposed**
   * **payload - not exposed**
   
   ### HandshakeResponse
   **Handshake is not exposed at all**
   
   * **protocol_version - not exposed**
   * **payload - not exposed**
   
   ### BasicAuth
   **Basic auth is not exposed at all**
   
   * **username - not exposed**
   * **password - not exposed**
   
   ### ActionType
   * Type - exposed
   * Description - exposed
   
   ### Criteria
   * Expression - exposed
   
   ### Action
   * Type - exposed
   * Body - exposed
   
   ### Result
   * Body - exposed
   
   ### SchemaResult
   *Schema result is not mapped to its own type, but returns a schema directly*
   
   * Schema - exposed
   
   ### FlightDescriptor
   * Type - exposed
   * Path - exposed
   * Cmd - exposed
   
   ### FlightInfo
   
   * Schema - exposed
   * FlightDescriptor - exposed
   * endpoints - exposed
   * TotalRecords - exposed
   * TotalBytes - exposed
   
   ### FlightEndpoint
   
   * Ticket - exposed
   * Locations - exposed
   
   ### Location
   
   * Uri - exposed
   
   ### Ticket
   
   * Ticket - exposed
   
   ### FlightData
   
   Flight data is not exposed as a class, but the data is exposed in getStream, 
startPut for client, and DoGet and DoPut for server.
   
   * **FlightDescriptor**
     * **client can only send flight descriptor for flight data**
     * **server can only read for flight data**
     * Desc: *The descriptor of the data. This is only relevant when a client 
is starting a new DoPut stream.*
   * **Data header - used to read header of message, ex: schema, record batch 
message etc, not explicitly exposed and used internally only**
   * Application Metadata
     * Client can write metadata through *ClientRecordBatchStreamWriter*
     * Client can read metadata through *FlightRecordBatchStreamReader*
     * Server can write metadata through FlightServerRecordBatchStreamWriter
     * Server can read metadata through *FlightRecordBatchStreamReader*
   * **Data body - same as data header, exposure is done through RecordBatch**
   
   ### PutResult
   
   * Application metadata - exposed
   
   # Classes public or internal
   
   Here is a list of classes and if they are public or internal, and motivation 
on why they are public.
   
   * FlightRecordBatchStreamingCall - public, required since it exposes 
FlightRecordBatchStreamReader which is required to read metadata.
   * FlightRecordBatchStreamReader - public, required to allow read on schema, 
flight descriptor, application metadata
   * RecordBatcReaderImplementation - internal
   * FlightClientRecordBatchStreamWriter - public, implements CompleteAsync for 
clients, extends FlightRecordBatchStreamWriter wich allows user to write 
application metadata.
   * FlightDataStream - internal
   * FlightRecordBatchDuplexStreamingCall - public, exposes 
FlightClientRecordBatchStreamWriter which is required to write application 
metadata.
   * FlightRecordBatchStreamWriter - public abstract with private protected 
constructor, client and server implementations extend this one.
   * FlightServerRecordBatchStreamWriter - public, extends 
FlightRecordBatchStreamWriter which allows user to write application metadata.
   * SchemaWriter - internal
   * FlightAction - public, allows user to read/write type, and body in Actions.
   * FlightActionType - public, allows user to read/write action types with 
type and description from ListActions.
   * FlightClient - public, allows user to call all the different endpoints.
   * FlightCriteria - public, exposes Expression that servers can implement to 
filter result from ListFlights.
   * FlightDescriptor - public, exposes DescriptorType, Paths and Cmd for user.
   * FlightDescriptorType - public, contains descriptor types.
   * FlightEndpoint - public, exposes flight ticket and flight locations.
   * FlightInfo - public, exposes flight descriptor, schema, total bytes, total 
records and endpoints.
   * FlightLocation - public, exposes uri to the user.
   * FlightMessageSerializer - internal
   * FlightPutResult - public, recieved when doing doPut, exposes application 
metadata.
   * FlightResult - public, contains body from DoAction calls.
   * FlightServerImplementation - internal, forced internal from putting 
generated code to internal.
   * FlightTicket - public, exposes the bytestring from the ticket to the user.
   * IFlightServer - public, interface to implement a flight server.
   * StreamReader - internal
   * StreamWriter - internal
   * FlightIEndpointRouteBuilderExtensions - public, allows user to map the 
flight endpoint in asp net core.
   * FlightIGrpcServerBuilderExtensions - public, allows the user to add their 
implementation of IFlightServer in a nicer way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to