gopidesupavan commented on code in PR #62867:
URL: https://github.com/apache/airflow/pull/62867#discussion_r3006244082
##########
providers/common/sql/src/airflow/providers/common/sql/datafusion/object_storage_provider.py:
##########
@@ -70,12 +70,78 @@ def get_scheme(self) -> str:
return "file://"
+class GCSObjectStorageProvider(ObjectStorageProvider):
+ """GCS Object Storage Provider using DataFusion's GoogleCloud."""
+
+ @property
+ def get_storage_type(self) -> StorageType:
+ """Return the storage type."""
+ return StorageType.GCS
+
+ def create_object_store(self, path: str, connection_config:
ConnectionConfig | None = None):
+ """Create a GCS object store using DataFusion's GoogleCloud."""
+ if connection_config is None:
+ raise ValueError(f"connection_config must be provided for
{self.get_storage_type}")
+
+ try:
+ from airflow.providers.google.common.hooks.base_google import
GoogleBaseHook
+
+ bucket = self.get_bucket(path)
Review Comment:
This only supports the `key_path` / `keyfile_dict` / ADC subset of Google
auth because it relies solely on `provide_gcp_credential_file_as_context()`. It
drops other valid Airflow Google connection modes like `credential_config_file`.
As written, `gs://` support is narrower than the provider’s existing
connection contract. Please either map the missing modes through or
document/reject them explicitly and add regression tests.
```
credential_config_file = gcp_hook._get_field("credential_config_file", None)
credentials["application_credentials_path"] = credential_config_file
```
ref: https://datafusion.apache.org/user-guide/cli/datasources.html#gcs
##########
providers/common/sql/src/airflow/providers/common/sql/datafusion/object_storage_provider.py:
##########
@@ -70,12 +70,78 @@ def get_scheme(self) -> str:
return "file://"
+class GCSObjectStorageProvider(ObjectStorageProvider):
+ """GCS Object Storage Provider using DataFusion's GoogleCloud."""
+
+ @property
+ def get_storage_type(self) -> StorageType:
+ """Return the storage type."""
+ return StorageType.GCS
+
+ def create_object_store(self, path: str, connection_config:
ConnectionConfig | None = None):
+ """Create a GCS object store using DataFusion's GoogleCloud."""
+ if connection_config is None:
+ raise ValueError(f"connection_config must be provided for
{self.get_storage_type}")
+
+ try:
+ from airflow.providers.google.common.hooks.base_google import
GoogleBaseHook
+
+ bucket = self.get_bucket(path)
Review Comment:
This only supports the `key_path` / `keyfile_dict` / ADC subset of Google
auth because it relies solely on `provide_gcp_credential_file_as_context()`. It
drops other valid Airflow Google connection modes like `credential_config_file`.
`gs://` support is narrower than the provider’s existing connection
contract. Please either map the missing modes through or document/reject them
explicitly and add regression tests.
```
credential_config_file = gcp_hook._get_field("credential_config_file", None)
credentials["application_credentials_path"] = credential_config_file
```
ref: https://datafusion.apache.org/user-guide/cli/datasources.html#gcs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]