alamb commented on code in PR #5732:
URL: https://github.com/apache/arrow-datafusion/pull/5732#discussion_r1156366951


##########
docs/source/user-guide/cli.md:
##########
@@ -198,19 +232,56 @@ Details of the environment variables that can be used are
 - AWS_CONTAINER_CREDENTIALS_RELATIVE_URI -> 
<https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html>
 - AWS_ALLOW_HTTP -> set to "true" to permit HTTP connections without TLS
 
-Example:
+## Registering OSS Data Sources
 
-```bash
-$ aws s3 cp test.csv s3://my-bucket/
-upload: ./test.csv to s3://my-bucket/test.csv
+OSS data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL 
statement.
 
-$ export AWS_DEFAULT_REGION=us-east-2
-$ export AWS_SECRET_ACCESS_KEY=***************************
-$ export AWS_ACCESS_KEY_ID=**************
+```sql
+CREATE EXTERNAL TABLE test
+STORED AS PARQUET
+OPTIONS(
+    'access_key_id' '******',
+    'secret_access_key' '******',
+    'endpoint' 'https://bucket.oss-cn-hangzhou.aliyuncs.com'
+)
+LOCATION 'oss://bucket/path/file.parquet';
+```
+
+The supported OPTIONS are:
+
+- access_key_id
+- secret_access_key
+- endpoint
+
+Note that the `endpoint` format of oss needs to be: 
`https://{bucket}.{oss-region-endpoint}`
+
+## Registering GCS Data Sources
+
+GCS data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL 
statement.

Review Comment:
   ```suggestion
   [Google Cloud Storage](https://cloud.google.com/storage) data sources can be 
registered by executing a `CREATE EXTERNAL TABLE` SQL statement.
   ```



##########
datafusion-cli/src/object_storage.rs:
##########
@@ -1,193 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-use datafusion::error::Result;
-use std::{env, str::FromStr, sync::Arc};
-
-use datafusion::datasource::object_store::{
-    DefaultObjectStoreRegistry, ObjectStoreRegistry,
-};
-use datafusion::error::DataFusionError;
-use object_store::{aws::AmazonS3Builder, gcp::GoogleCloudStorageBuilder, 
ObjectStore};
-use url::Url;
-
-#[derive(Debug, PartialEq, Eq, clap::ArgEnum, Clone)]
-pub enum ObjectStoreScheme {
-    S3,
-    GCS,
-}
-
-impl FromStr for ObjectStoreScheme {
-    type Err = DataFusionError;
-
-    fn from_str(input: &str) -> Result<Self> {
-        match input {
-            "s3" => Ok(ObjectStoreScheme::S3),
-            "gs" | "gcs" => Ok(ObjectStoreScheme::GCS),
-            _ => Err(DataFusionError::Execution(format!(
-                "Unsupported object store scheme {}",
-                input
-            ))),
-        }
-    }
-}
-
-/// An [`ObjectStoreRegistry`] that can automatically create S3 and GCS stores 
for a given URL
-#[derive(Debug, Default)]
-pub struct DatafusionCliObjectStoreRegistry {
-    inner: DefaultObjectStoreRegistry,
-}
-
-impl DatafusionCliObjectStoreRegistry {
-    pub fn new() -> Self {
-        Default::default()
-    }
-}
-
-impl ObjectStoreRegistry for DatafusionCliObjectStoreRegistry {
-    fn register_store(
-        &self,
-        url: &Url,
-        store: Arc<dyn ObjectStore>,
-    ) -> Option<Arc<dyn ObjectStore>> {
-        self.inner.register_store(url, store)
-    }
-
-    fn get_store(&self, url: &Url) -> Result<Arc<dyn ObjectStore>> {
-        self.inner.get_store(url).or_else(|_| {
-            let store =
-                ObjectStoreScheme::from_str(url.scheme()).map(
-                    |scheme| match scheme {
-                        ObjectStoreScheme::S3 => build_s3_object_store(url),
-                        ObjectStoreScheme::GCS => build_gcs_object_store(url),
-                    },
-                )??;
-
-            self.inner.register_store(url, store.clone());
-
-            Ok(store)
-        })
-    }
-}
-
-fn build_s3_object_store(url: &Url) -> Result<Arc<dyn ObjectStore>> {
-    let host = get_host_name(url)?;
-    match AmazonS3Builder::from_env().with_bucket_name(host).build() {
-        Ok(s3) => Ok(Arc::new(s3)),
-        Err(err) => Err(DataFusionError::External(Box::new(err))),
-    }
-}
-
-fn build_gcs_object_store(url: &Url) -> Result<Arc<dyn ObjectStore>> {
-    let host = get_host_name(url)?;
-    let mut builder = GoogleCloudStorageBuilder::new().with_bucket_name(host);
-
-    if let Ok(path) = env::var("GCP_SERVICE_ACCOUNT_PATH") {
-        builder = builder.with_service_account_path(path);
-    }
-    match builder.build() {
-        Ok(gcs) => Ok(Arc::new(gcs)),
-        Err(err) => Err(DataFusionError::External(Box::new(err))),
-    }
-}
-
-fn get_host_name(url: &Url) -> Result<&str> {
-    url.host_str().ok_or_else(|| {
-        DataFusionError::Execution(format!(
-            "Not able to parse hostname from url, {}",
-            url.as_str()
-        ))
-    })
-}
-
-#[cfg(test)]
-mod tests {

Review Comment:
   do you think there is any value to porting these tests?



##########
docs/source/user-guide/cli.md:
##########
@@ -180,15 +180,49 @@ STORED AS CSV
 LOCATION '/path/to/aggregate_test_100.csv';
 ```
 
-## Querying S3 Data Sources
+## Registering S3 Data Sources
 
-The CLI can query data in S3 if the following environment variables are 
defined:
+S3 data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL 
statement.
 
-- `AWS_DEFAULT_REGION`
-- `AWS_ACCESS_KEY_ID`
-- `AWS_SECRET_ACCESS_KEY`
+```sql
+CREATE EXTERNAL TABLE test
+STORED AS PARQUET
+OPTIONS(
+    'access_key_id' '******',
+    'secret_access_key' '******',
+    'region' 'us-east-2'
+)
+LOCATION 's3://bucket/path/file.parquet';
+```
+
+The supported OPTIONS are:
+
+- access_key_id
+- secret_access_key
+- session_token
+- region
+
+It is also simplify sql statements by environment variables.

Review Comment:
   ```suggestion
   It is also possible to simplify sql statements by environment variables.
   ```



##########
datafusion-cli/src/object_storage.rs:
##########
@@ -1,193 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   Very nice that we can consolidate the usage



##########
docs/source/user-guide/cli.md:
##########
@@ -180,15 +180,49 @@ STORED AS CSV
 LOCATION '/path/to/aggregate_test_100.csv';
 ```
 
-## Querying S3 Data Sources
+## Registering S3 Data Sources
 
-The CLI can query data in S3 if the following environment variables are 
defined:
+S3 data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL 
statement.

Review Comment:
   Perhaps we should define S3 by adding a link:
   
   ```suggestion
   [AWS S3](https://aws.amazon.com/s3/) data sources can be registered by 
executing a `CREATE EXTERNAL TABLE` SQL statement.
   ```



##########
docs/source/user-guide/cli.md:
##########
@@ -198,19 +232,56 @@ Details of the environment variables that can be used are
 - AWS_CONTAINER_CREDENTIALS_RELATIVE_URI -> 
<https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html>
 - AWS_ALLOW_HTTP -> set to "true" to permit HTTP connections without TLS
 
-Example:
+## Registering OSS Data Sources
 
-```bash
-$ aws s3 cp test.csv s3://my-bucket/
-upload: ./test.csv to s3://my-bucket/test.csv
+OSS data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL 
statement.

Review Comment:
   ```suggestion
   [Alibaba cloud 
OSS](https://www.alibabacloud.com/product/object-storage-service) data sources 
can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to