Ulimo edited a comment on pull request #8694:
URL: https://github.com/apache/arrow/pull/8694#issuecomment-731658726
@eerhardt first of all thank you so much for your support and feedback and
sorry for this wall of text. Based on your last feedback that the API is hard
to change etc in the future, I really took a long sitting to go through all the
classes etc, and found some issues that I really needed to address.
This resulted in a small overhaul on the structure etc. I also went through
so the gRPC schema information is actually exposed to the user as well. This
also resulted in some rewrites. The feedback you have given has been exactly
what I needed to come to the conclusions etc on structure, so I am once again
extremely thankful for all the feedback you have given.
This ofcourse increased the scope of the PR a bit again, but I hope it helps
give a more real look and feel on how the framework will behave, and if the
implemented design actually works.
Here comes information on the work that has been done, and also some
checklists to see that all the required information is exposed etc, and why
some classes are public.
# Major changes
* All exposed classes start with prefix "Flight".
* Generated code is now internal.
* New project Apache.Arrow.Flight.AspNetCore, required since generated code
is internal, this project references grpc.aspnetcore.server which is
netcoreapp3.1 or net5.0 only, and contains extensions only to add flight
support for aspnetcore projects.
* All flight client methods now follows gRPC API more closely, allows
reading response headers for all calls.
* Exposing application metadata required rewriting a bit of the logic in
client and servers, read more in its chapter.
* Moved client specific classes under .Client namespace.
* Moved server specific classes under .Server namespace.
* Internal classes are now under the .Internal namespace.
* Only mutual used classes are now in the .Flight namespace.
# Minor changes
* Removed multi property exposure of bytestring, to only bytestring.
* Exposed TotalBytes and TotalRecords in FlightInfo.
* Moved public abstract reader, writer classes to internal namespace since
they cant be used by the user.
* Added FlightServerRecordBatchStreamReader to expose FlightDescriptor only
to the server since the client wont use it.
* Added FlightClientRecordBatchStreamReader which just extends
FlightRecordBatchStreamReader which is now abstract.
* StreamWriter made internal
* PutResult does not expose ArrowBuffer (was taken from java project),
instead it exposes bytestring like the other classes.
This was done so the user can choose what type of metadata to return, and
also follow more closely the other classes.
* Added public static Empty on PutResult, can be useful for implementations
when metadata is not used, and empty put result should be returned.
* Added a new test case to get metadata.
* Added a new test case to put metadata.
* Added a new test case for get schema.
* Added a new test case for DoAction.
* Added a new test case for ListFlights.
Test coverage is now above 80% on the non generated code.
# Application metadata addition
To enable users to write and read application metadata, the class
FlightRecordBatchStreamingCall was added that
still follow gRPC look and feel, but exposes FlightRecordBatchStreamReader
as response stream instead where the user can
get application metadata.
Application metadata can be read with each record batch, it is implemented
as a list since hypothetically metadata can
be sent with the schema aswell. Since the packages sent follow this pattern
at the moment:
Schema -> record batch -> record batch -> ... -> done
So the first message could contains metadata as well. When a user gets a new
record batch the metadata is cleared similar
to how the java implementation is done.
For writing, this is done by an additional method in
FlightRecordBatchStreamWriter which looks like this:
```
public Task WriteAsync(RecordBatch message, ByteString applicationMetadata);
```
# gRPC support
Here is a check on what gRPC methods and properties are exposed and can be
read/used by a user of the framework, mostly to check that everything that is
required to be exposed are exposed at this time.
## gRPC methods exposed or not exposed to the user
* ListFlights - Exposed
* GetFlightInfo - Exposed
* GetSchema - Exposed
* DoGet - Exposed
* DoPut - Exposed
* DoAction - Exposed
* ListActions -Exposed
* **Handshake - Not exposed**
* **DoExchange - Not exposed**
## gRPC properties exposed or not exposed to the user
### HandshakeRequest
**Handshake is not exposed at all**
* **protocol_version - not exposed**
* **payload - not exposed**
### HandshakeResponse
**Handshake is not exposed at all**
* **protocol_version - not exposed**
* **payload - not exposed**
### BasicAuth
**Basic auth is not exposed at all**
* **username - not exposed**
* **password - not exposed**
### ActionType
* Type - exposed
* Description - exposed
### Criteria
* Expression - exposed
### Action
* Type - exposed
* Body - exposed
### Result
* Body - exposed
### SchemaResult
*Schema result is not mapped to its own type, but returns a schema directly*
* Schema - exposed
### FlightDescriptor
* Type - exposed
* Path - exposed
* Cmd - exposed
### FlightInfo
* Schema - exposed
* FlightDescriptor - exposed
* endpoints - exposed
* TotalRecords - exposed
* TotalBytes - exposed
### FlightEndpoint
* Ticket - exposed
* Locations - exposed
### Location
* Uri - exposed
### Ticket
* Ticket - exposed
### FlightData
Flight data is not exposed as a class, but the data is exposed in getStream,
startPut for client, and DoGet and DoPut for server.
* **FlightDescriptor**
* **client can only send flight descriptor for flight data**
* **server can only read for flight data**
* Desc: *The descriptor of the data. This is only relevant when a client
is starting a new DoPut stream.*
* **Data header - used to read header of message, ex: schema, record batch
message etc, not explicitly exposed and used internally only**
* Application Metadata
* Client can write metadata through *ClientRecordBatchStreamWriter*
* Client can read metadata through *FlightRecordBatchStreamReader*
* Server can write metadata through FlightServerRecordBatchStreamWriter
* Server can read metadata through *FlightRecordBatchStreamReader*
* **Data body - same as data header, exposure is done through RecordBatch**
### PutResult
* Application metadata - exposed
# Classes public or internal
Here is a list of classes and if they are public or internal, and motivation
on why they are public.
* FlightRecordBatchStreamingCall - public, required since it exposes
FlightRecordBatchStreamReader which is required to read metadata.
* FlightRecordBatchStreamReader - public, required to allow read on schema,
flight descriptor, application metadata
* RecordBatcReaderImplementation - internal
* FlightClientRecordBatchStreamWriter - public, implements CompleteAsync for
clients, extends FlightRecordBatchStreamWriter wich allows user to write
application metadata.
* FlightDataStream - internal
* FlightRecordBatchDuplexStreamingCall - public, exposes
FlightClientRecordBatchStreamWriter which is required to write application
metadata.
* FlightRecordBatchStreamWriter - public abstract with private protected
constructor, client and server implementations extend this one.
* FlightServerRecordBatchStreamWriter - public, extends
FlightRecordBatchStreamWriter which allows user to write application metadata.
* SchemaWriter - internal
* FlightAction - public, allows user to read/write type, and body in Actions.
* FlightActionType - public, allows user to read/write action types with
type and description from ListActions.
* FlightClient - public, allows user to call all the different endpoints.
* FlightCriteria - public, exposes Expression that servers can implement to
filter result from ListFlights.
* FlightDescriptor - public, exposes DescriptorType, Paths and Cmd for user.
* FlightDescriptorType - public, contains descriptor types.
* FlightEndpoint - public, exposes flight ticket and flight locations.
* FlightInfo - public, exposes flight descriptor, schema, total bytes, total
records and endpoints.
* FlightLocation - public, exposes uri to the user.
* FlightMessageSerializer - internal
* FlightPutResult - public, recieved when doing doPut, exposes application
metadata.
* FlightResult - public, contains body from DoAction calls.
* FlightServerImplementation - internal, forced internal from putting
generated code to internal.
* FlightTicket - public, exposes the bytestring from the ticket to the user.
* IFlightServer - public, interface to implement a flight server.
* StreamReader - internal
* StreamWriter - internal
* FlightIEndpointRouteBuilderExtensions - public, allows user to map the
flight endpoint in asp net core.
* FlightIGrpcServerBuilderExtensions - public, allows the user to add their
implementation of IFlightServer in a nicer way.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]