dabla commented on PR #62867:
URL: https://github.com/apache/airflow/pull/62867#issuecomment-4156045420

   > > I'm not in favor of this solution, it should be refactored accordingly 
(see comments) so that common-sql stays provider agnostic which is not the case 
here.
   > > Make sure to remove the amazon code from common.sql, but keep it 
backward-compatible via common.compat.
   > > Check following PR on how to make this registration mechanismfor 
ObjectStorageProvider generic (which would in my PR translate to Dialect, the 
idea is the same):
   > > Added notion of dialects into ProvidersManager (#43726)
   > > Once the providers has support to register ObjectStorageProviders, you 
can finish this PR with the actual implementation of the provider specific code 
like AWS/Google/Azure.
   > > For that you can also check my second PR (as reference if needed) in 
which I actually added the different dialects, which is the equivalent of this 
PR.
   > > Introduce notion of dialects in DbApiHook (#41327)
   > > So I think best for a all would be to create a new PR (and reference 
this PR in it so we know they are linked) in which the registration mechanism 
is being done for any provider. And then finish this one by moving provider 
specific code to the corresponding providers once the other PR is 
finished/merged.
   > 
   > I understand your intent, but DataFusion only requires three providers. 
For just these three, I wouldn’t introduce new interfaces based on dialects for 
each provider—it’s a relatively simple case. We can fetch the connection 
details using the connection ID via BaseHook.
   > 
   > We started with in commonSQL, because DataFusion primarily works with SQL. 
Eventually, there’s a plan to move this entire part into a dedicated DataFusion 
provider that we have discussion with kaxil and me when we started working on 
common.ai provider. i also have discussion quite long back with @potiuk have 
this in separate provider. That’s why all DataFusion-related components can be 
organized under a single module. and move them ..
   > 
   > Also, even if we add dailect functionality within the providers, it won’t 
be utilized by those providers unless it’s explicitly invoked from DataFusion. 
so IMHO its extra overhead for the 3 providers adding dilect interface..
   
   II think there’s a misunderstanding of what I’m proposing. I’m **not** 
suggesting introducing dialects here, that doesn't even make sense.
   
   I’m referring to the **architecture and registration pattern**, and I used 
dialects purely as an example of how that pattern is implemented correctly 
today.
   
   The key point is this:
   
   - `common-sql` should remain **provider-agnostic**
   - It should **not import or depend on provider-specific code** (even via 
local imports as a workaround)
   - Provider-specific implementations (AWS, GCP, Azure) should live **in their 
respective providers**, including their tests
   
   What I’m proposing is to follow the same architectural pattern used for 
dialects:
   
   - `common-sql` defines the **abstraction** (`ObjectStorageProvider` already 
exists)
   - Each provider implements its own concrete version (e.g. 
S3ObjectStorageProvider/GCSObjectStorageProvider/AzureObjectStorageProvider/LocalObjectStorageProvider)
   - Each implementation **registers itself automatically** (via the 
ProvidersManager), without `common-sql` needing to know anything about it
   
   So just like:
   
   > `common-sql` does not know which dialects exist — it only defines the 
interface
   
   the same should apply here:
   
   > `common-sql` should not know which object storage providers exist — it 
should only define the abstraction (which we already have with 
`ObjectStorageProvider`)
   
   The issue with the current approach is that it introduces **implicit 
coupling** by pulling provider-specific logic into `common-sql` (even if 
indirectly), which breaks that separation.  `common-sql` is suddenly coupled to 
all specific object storage providers, which is not clean.
   
   This isn’t about supporting “only 3 providers” — it’s about keeping the 
architecture **clean, extensible, and consistent with existing patterns** in 
Airflow.  Once we have the mechanism, it easy to add a new one, as it will 
automatically register, and it will live in it's dedicated provider.
   
   Using a registration mechanism (like the one used for dialects) solves this 
cleanly without adding unnecessary complexity and coupling, while keeping 
responsibilities properly separated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to