kou commented on code in PR #46194:
URL: https://github.com/apache/arrow/pull/46194#discussion_r2053135123


##########
format/Flight.proto:
##########
@@ -426,8 +426,43 @@ message Ticket {
 }
 
 /*
- * A location where a Flight service will accept retrieval of a particular
- * stream given a ticket.
+ * A location to retrieve a particular stream from. This URI should be one of
+ * the following:
+ *  - An empty string or the string 'arrow-flight-reuse-connection://?':
+ *    indicating that the ticket can be redeemed on the service where the
+ *    ticket was generated via a DoGet request.
+ *  - A valid grpc URI (grpc://, grpc+tls://, grpc+unix://, etc.):
+ *    indicating that the ticket can be redeemed on the service at the given
+ *    URI via a DoGet request.
+ *  - A valid HTTP URI (http://, https://, etc.):
+ *    indicating that the client should perform a GET request against the
+ *    given URI to retrieve the stream. The ticket should have been empty
+ *    in this case and should be ignored by the client.
+ *  - An object storage URI (s3://, gs://, abfs://, etc.):
+ *    indicating that the client should retrieve the data from the provided
+ *    object storage location. The ticket should be empty in this case and
+ *    should be ignored by the client.
+ *
+ * We allow non-Flight URIs for the purpose of allowing Flight services to 
indicate that
+ * results can be downloaded in formats other than Arrow (such as Parquet) or 
to allow
+ * direct fetching of results from a URI to reduce excess copying and data 
movement.
+ * In these cases, the following conventions should be followed by servers and 
clients:
+ *
+ *  - Unless otherwise specified by the 'Content-Type' header of the response,
+ *    a client should assume the response is an Arrow IPC Stream. Usage of an 
IANA

Review Comment:
   ```suggestion
    *    a client should assume the response is an Arrow IPC Streaming format. 
Usage of an IANA
   ```



##########
format/Flight.proto:
##########
@@ -426,8 +426,43 @@ message Ticket {
 }
 
 /*
- * A location where a Flight service will accept retrieval of a particular
- * stream given a ticket.
+ * A location to retrieve a particular stream from. This URI should be one of
+ * the following:
+ *  - An empty string or the string 'arrow-flight-reuse-connection://?':
+ *    indicating that the ticket can be redeemed on the service where the
+ *    ticket was generated via a DoGet request.
+ *  - A valid grpc URI (grpc://, grpc+tls://, grpc+unix://, etc.):
+ *    indicating that the ticket can be redeemed on the service at the given
+ *    URI via a DoGet request.
+ *  - A valid HTTP URI (http://, https://, etc.):
+ *    indicating that the client should perform a GET request against the
+ *    given URI to retrieve the stream. The ticket should have been empty

Review Comment:
   It may be better that we use the same wording as the "An object storage URI" 
case:
   
   ```suggestion
    *    given URI to retrieve the stream. The ticket should be empty
   ```



##########
format/Flight.proto:
##########
@@ -426,8 +426,43 @@ message Ticket {
 }
 
 /*
- * A location where a Flight service will accept retrieval of a particular
- * stream given a ticket.
+ * A location to retrieve a particular stream from. This URI should be one of
+ * the following:
+ *  - An empty string or the string 'arrow-flight-reuse-connection://?':
+ *    indicating that the ticket can be redeemed on the service where the
+ *    ticket was generated via a DoGet request.
+ *  - A valid grpc URI (grpc://, grpc+tls://, grpc+unix://, etc.):
+ *    indicating that the ticket can be redeemed on the service at the given
+ *    URI via a DoGet request.
+ *  - A valid HTTP URI (http://, https://, etc.):
+ *    indicating that the client should perform a GET request against the
+ *    given URI to retrieve the stream. The ticket should have been empty
+ *    in this case and should be ignored by the client.
+ *  - An object storage URI (s3://, gs://, abfs://, etc.):
+ *    indicating that the client should retrieve the data from the provided
+ *    object storage location. The ticket should be empty in this case and
+ *    should be ignored by the client.
+ *
+ * We allow non-Flight URIs for the purpose of allowing Flight services to 
indicate that
+ * results can be downloaded in formats other than Arrow (such as Parquet) or 
to allow
+ * direct fetching of results from a URI to reduce excess copying and data 
movement.
+ * In these cases, the following conventions should be followed by servers and 
clients:
+ *
+ *  - Unless otherwise specified by the 'Content-Type' header of the response,
+ *    a client should assume the response is an Arrow IPC Stream. Usage of an 
IANA
+ *    media type like 'application/octet-stream' should be assumed to be an 
Arrow
+ *    IPC Stream.

Review Comment:
   ```suggestion
    *    IPC Streaming format.
   ```



##########
docs/source/format/Flight.rst:
##########
@@ -369,6 +369,61 @@ string, so the obvious candidates are not compatible.  The 
chosen
 representation can be parsed by both implementations, as well as Go's
 ``net/url`` and Python's ``urllib.parse``.
 
+Extended Location URIs
+----------------------
+
+In addition to alternative transports, a server may also return
+URIs that reference an external service or object storage location.
+This can be useful in cases where intermediate data is cached as
+Apache Parquet files on S3 or is accessible via an HTTP service. In
+these scenarios, it is more efficient to be able to provide a URI
+where the client may simply download the data directly, rather than
+requiring a Flight service to read it back into memory and serve it
+from a ``DoGet`` request. Servers should use the following URI
+schemes for this situation:
+
++--------------------+------------------------+
+| Location           | URI Scheme             |
++====================+========================+
+| Object storage (1) | s3:, gcs:, abfs:, etc. |
++--------------------+------------------------+
+| HTTP service   (2) | http:, https:          |
++--------------------+------------------------+
+
+Notes:
+
+* \(1) Any auth required should be either negotiated externally to
+   Flight or should use a presigned URI.
+* \(2) The client should make a GET request to the provided URI
+   to retrieve the data.
+
+When using an extended location URI, the client should ignore any
+value in the ``Ticket`` field of the ``FlightEndpoint``. The
+``Ticket`` is only used for identifying data in the context of a
+Flight service, and is not needed when the client is directly
+downloading data from an external service.
+
+Clients should assume that, unless otherwise specified, the data is
+being returned as an Arrow IPC Stream just as it would via a ``DoGet``

Review Comment:
   How about linking to 
https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format:
   
   ```suggestion
   being returned as an :ref:`ipc-streaming-format` just as it would via a 
``DoGet``
   ```
   
   with
   
   ```diff 
   diff --git a/docs/source/format/Columnar.rst 
b/docs/source/format/Columnar.rst
   index e1603e8d8e..d60d682d76 100644
   --- a/docs/source/format/Columnar.rst
   +++ b/docs/source/format/Columnar.rst
   @@ -1462,6 +1462,8 @@ endianness that does not match the underlying system. 
The reference
    implementation is focused on Little Endian and provides tests for
    it. Eventually we may provide automatic conversion via byte swapping.
   
   +.. _ipc-streaming-format:
   +
    IPC Streaming Format
    --------------------
    
   @@ -1503,6 +1505,8 @@ metadata length (``0x00000000``) or closing the stream 
interface. We
    recommend the ".arrows" file extension for the streaming format although
    in many cases these streams will not ever be stored as files.
    
   +.. _ipc-file-format:
   +
    IPC File Format
    ---------------
    
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to