samuelkhtu commented on issue #53333:
URL: https://github.com/apache/airflow/issues/53333#issuecomment-3071533964

   Thank you @jroachgolf84 comment, that's a great question!
   
   The `account_host` is expected to be a **full hostname**, such as:
   
   ```
   testaccountname.blob.core.customdomain.io
   ```
   
   This is **not** just the default domain (`blob.core.windows.net`), and it’s 
also **not** the full URL with `https://` — the `AzureBlobFileSystem` 
constructor internally prepends `https://` when building the `account_url`.
   
   ---
   
   ### 🔍 Example Use Case
   
   Here’s a real-world example of how this is configured in Airflow:
   
   ```json
   {
       "conn_type": "wasb",
       "description": "Connection for Azure XCom Backend",
       "login": "<Azure client id>",
       "password": "<Azure client secret>",
       "host": "https://myblobstorageaccount.blob.core.mynet.io";,
       "extra": {
           "tenant_id": "<Azure Tenant ID>",
           "account_name": "testaccount"
           // Note: the custom domain name (myblobstorageaccount) is different 
from the actual Azure Storage account resource name.
       }
   }
   ```
   
   In this case, the `get_fs()` function returns:
   
   ```python
   return AzureBlobFileSystem(**options)
   ```
   
   Where `options` includes:
   
   ```json
   {
     "account_url": "https://myblobstorageaccount.blob.core.mynet.io";,
     "client_id": "<Azure client ID>",
     "client_secret": "<Azure client secret>",
     "account_name": "testaccount",
     "tenant_id": "<Azure Tenant ID>"
   }
   ```
   
   ---
   
   ### ⚠️ Why `account_host` Matters
   
   Inside the `AzureBlobFileSystem` constructor, the logic for setting 
`account_url` is:
   
   ```python
   if hasattr(self, "account_host"):
       self.account_url = f"https://{self.account_host}";
   else:
       self.account_url = f"https://{self.account_name}.blob.core.windows.net";
   ```
   
   If `account_host` is **not passed**, the fallback logic uses the default 
domain — which breaks the connection for users relying on custom endpoints.
   
   This is exactly what happens if `account_host` is not extracted from the 
Airflow connection extras and passed into `options`.
   
   ---
   
   ### 🤔 Why not just use `host`?
   
   In Airflow, the `host` field is part of the base connection object, but it’s 
not automatically mapped to `account_host`. Instead, provider-specific 
parameters like `account_host` must be explicitly extracted from the `extra` 
dictionary.
   
   This separation allows Airflow to maintain a consistent interface across 
providers while still supporting advanced configurations like custom Azure 
domains.
   
   ---
   
   ### ✅ Summary
   
   - `account_host` should be a full hostname like 
`myblobstorageaccount.blob.core.mynet.io`.
   - It must be explicitly passed to `AzureBlobFileSystem` to override the 
default domain.
   - Without it, the fallback logic incorrectly assumes 
`blob.core.windows.net`, which fails for custom/private endpoints.
   - The current implementation in Airflow needs to be updated to include 
`account_host` in the list of extracted fields.
   
   Let me know if you have any suggestion! 
   
   Yes, please assign the PR to me. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to