emkornfield commented on a change in pull request #12571:
URL: https://github.com/apache/arrow/pull/12571#discussion_r829428851



##########
File path: format/Flight.proto
##########
@@ -301,6 +301,36 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result
+  // without requiring another RPC round-trip to retrieve the ticket.  
applications
+  // built on top of Flight are responsible for any negotiation necessary on 
whether
+  // inlining data is appropriate.
+  //
+  // The size of inlined_data is expected to be small (typically less then 
1MB) 
+  // and inlining too much data across tickets can run into underlying 
transport 
+  // limitations.  Furthermore, since the data is expected to be small, 
implementations 
+  // are less likely to optimize for zero-copy in these cases. 
+  //
+  // The inlined data is not expected to contain schema metadata. The schema
+  // should be identical to the schema provided on FlightInfo. 
+  repeated FlightData inlined_data = 2;
+  enum InlinedCompleteness {

Review comment:
       Since the Ticket is already an opaque data blob that is interpreted by 
the application, I am not sure how much extra value adding a generic 
inlined_data_completeness would be.
   
   > For applications that need to send NO_INLINED_DATA they simply omit 
inlined_data
   
   I'm going to simplify this to make a nested message for inlined_data which 
cleans this up a little bit.
   
   > For applications that need to send data, they could include inlined_data 
and could distinguish between "complete" and "sample data" somehow in the 
application defined Ticket payload anyways.
   
   I agree with David on the opaqueness of Ticket which is my understanding 
also.  I think the benefit of directly modeling complete vs incomplete, is at 
least for incomplete data, we can completely elide the details from consumers.  
Users could directly pass the ticket to DoGet and the framework can recognize 
there is inlined data present and its complete and just use that without 
existing clients making any code changes.  I haven't started implementing this 
but that was the approach I was going to look at first to see if it is feasible.
   
   For sampled data it is always going to require a client side change to make 
use of it unless we the framework provides stronger guarantees (i.e. data is 
always the same exact data as the first part of the stream).

##########
File path: format/Flight.proto
##########
@@ -301,6 +301,33 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result
+  // without requiring another RPC round-trip to retrieve the ticket.  
applications
+  // built on top of Flight are responsible for any negotiation necessary on 
whether
+  // inlining data is appropriate.
+  //
+  // The size of inlined_data is expected to be small (typically less then 
1MB) 
+  // and inlining too much data across tickets can run into underlying 
transport 
+  // limitations.  Furthermore, since the data is expected to be small, 
implementations 
+  // are less likely to optimize for zero-copy in these cases.  
+  repeated FlightData inlined_data = 2;
+  enum InlinedCompleteness {
+    // Default is no data is inlined.  An UNDEFINED value is not provided 
because an enum
+    // for the data that the client isn't aware of makes the data unusable.
+    NO_INLINED_DATA = 0;
+    // The data present in inlined_data represents all data
+    // present for the ticket.
+    COMPLETE_DATA = 1;
+    // The data present for inlined_data represents only a partial
+    // sample of the data available for the ticket.  No guarantees on the 
+    // ordering are provided.
+    SAMPLE_DATA = 2;

Review comment:
       > I feel like this type of behavior could also be encoded using 
application specific metadata (rather than explicitly in the Flight protocol 
itself)
   
   Lets try to consolidate this thread with the one above?

##########
File path: format/Flight.proto
##########
@@ -301,6 +301,36 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result
+  // without requiring another RPC round-trip to retrieve the ticket.  
applications
+  // built on top of Flight are responsible for any negotiation necessary on 
whether
+  // inlining data is appropriate.
+  //
+  // The size of inlined_data is expected to be small (typically less then 
1MB) 
+  // and inlining too much data across tickets can run into underlying 
transport 
+  // limitations.  Furthermore, since the data is expected to be small, 
implementations 
+  // are less likely to optimize for zero-copy in these cases. 
+  //
+  // The inlined data is not expected to contain schema metadata. The schema
+  // should be identical to the schema provided on FlightInfo. 
+  repeated FlightData inlined_data = 2;
+  enum InlinedCompleteness {
+    // Default is no data is inlined.  An UNDEFINED value is not provided 
because an enum
+    // for the data that the client isn't aware of makes the data unusable 
anyways.
+    NO_INLINED_DATA = 0;

Review comment:
       this will be changed to undefined.  I'm going to move the enum and the 
FlightData into a nested message

##########
File path: format/Flight.proto
##########
@@ -301,6 +301,36 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result

Review comment:
       It could, open to suggestions on phrasing.

##########
File path: format/Flight.proto
##########
@@ -301,6 +301,36 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result
+  // without requiring another RPC round-trip to retrieve the ticket.  
applications
+  // built on top of Flight are responsible for any negotiation necessary on 
whether
+  // inlining data is appropriate.
+  //
+  // The size of inlined_data is expected to be small (typically less then 
1MB) 
+  // and inlining too much data across tickets can run into underlying 
transport 
+  // limitations.  Furthermore, since the data is expected to be small, 
implementations 
+  // are less likely to optimize for zero-copy in these cases. 
+  //
+  // The inlined data is not expected to contain schema metadata. The schema
+  // should be identical to the schema provided on FlightInfo. 
+  repeated FlightData inlined_data = 2;
+  enum InlinedCompleteness {
+    // Default is no data is inlined.  An UNDEFINED value is not provided 
because an enum
+    // for the data that the client isn't aware of makes the data unusable 
anyways.
+    NO_INLINED_DATA = 0;

Review comment:
       updating as we speak.  Will push a new version in a few minutes 
hopefully.  Convention for eums in protobufs is to make the first element 
undefined because it allows for clients to detech non-backward compatible 
changes by the server.

##########
File path: format/Flight.proto
##########
@@ -301,6 +301,36 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result
+  // without requiring another RPC round-trip to retrieve the ticket.  
applications
+  // built on top of Flight are responsible for any negotiation necessary on 
whether
+  // inlining data is appropriate.
+  //
+  // The size of inlined_data is expected to be small (typically less then 
1MB) 
+  // and inlining too much data across tickets can run into underlying 
transport 
+  // limitations.  Furthermore, since the data is expected to be small, 
implementations 
+  // are less likely to optimize for zero-copy in these cases. 
+  //
+  // The inlined data is not expected to contain schema metadata. The schema
+  // should be identical to the schema provided on FlightInfo. 
+  repeated FlightData inlined_data = 2;
+  enum InlinedCompleteness {
+    // Default is no data is inlined.  An UNDEFINED value is not provided 
because an enum
+    // for the data that the client isn't aware of makes the data unusable 
anyways.
+    NO_INLINED_DATA = 0;

Review comment:
       should be updated now.

##########
File path: format/Flight.proto
##########
@@ -301,6 +301,36 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result
+  // without requiring another RPC round-trip to retrieve the ticket.  
applications
+  // built on top of Flight are responsible for any negotiation necessary on 
whether
+  // inlining data is appropriate.
+  //
+  // The size of inlined_data is expected to be small (typically less then 
1MB) 
+  // and inlining too much data across tickets can run into underlying 
transport 
+  // limitations.  Furthermore, since the data is expected to be small, 
implementations 
+  // are less likely to optimize for zero-copy in these cases. 
+  //
+  // The inlined data is not expected to contain schema metadata. The schema
+  // should be identical to the schema provided on FlightInfo. 
+  repeated FlightData inlined_data = 2;
+  enum InlinedCompleteness {
+    // Default is no data is inlined.  An UNDEFINED value is not provided 
because an enum
+    // for the data that the client isn't aware of makes the data unusable 
anyways.
+    NO_INLINED_DATA = 0;
+    // The data present in inlined_data represents all data
+    // present for the ticket.
+    COMPLETE_DATA = 1;
+    // The data present in inlined_data represents only a partial
+    // sample of the data available for the ticket.  No guarantees on the 
+    // ordering are provided.
+    SAMPLE_DATA = 2;

Review comment:
       better name.  Changed.

##########
File path: format/Flight.proto
##########
@@ -301,6 +301,36 @@ message Location {
  */
 message Ticket {
   bytes ticket = 1;
+  // Data representing some part of the data retrievable by the ticket.
+  //
+  // `inlined_completeness` indicates what part of the data retriavable
+  // by the ticket this represents. This is provided as an optimization for
+  // client/server applications that want to reduce latency to first result

Review comment:
       Thanks, tried rephrasing a little bit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to