pitrou commented on code in PR #46194:
URL: https://github.com/apache/arrow/pull/46194#discussion_r2053993771


##########
docs/source/format/Flight.rst:
##########
@@ -369,6 +369,61 @@ string, so the obvious candidates are not compatible.  The 
chosen
 representation can be parsed by both implementations, as well as Go's
 ``net/url`` and Python's ``urllib.parse``.
 
+Extended Location URIs
+----------------------
+
+In addition to alternative transports, a server may also return
+URIs that reference an external service or object storage location.
+This can be useful in cases where intermediate data is cached as
+Apache Parquet files on S3 or is accessible via an HTTP service. In
+these scenarios, it is more efficient to be able to provide a URI
+where the client may simply download the data directly, rather than
+requiring a Flight service to read it back into memory and serve it
+from a ``DoGet`` request. Servers should use the following URI
+schemes for this situation:
+
++--------------------+------------------------+
+| Location           | URI Scheme             |
++====================+========================+
+| Object storage (1) | s3:, gcs:, abfs:, etc. |

Review Comment:
   > Since the change is essentially about delegating data access to a 
different protocol, feels like it should be fine to delegate the particulars of 
URI syntax and such to wherever that protocol is defined as well.
   
   That's fine for those protocols that have a [well-known URI 
standard](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml). But 
many protocols such as S3 don't seem to, and ad-hoc S3 URI syntaxes may differ 
on various aspects.
   
   For example, the Arrow C++ routine for parsing S3 URIs allows for a number 
of query parameters such as `region`, `endpoint_override`, 
`tls_verify_certificates`... Other S3 URI implementations may define other 
optional parameters.
   
https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/cpp/src/arrow/filesystem/s3fs.cc#L349-L428
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to