dabla commented on PR #62867: URL: https://github.com/apache/airflow/pull/62867#issuecomment-4156045420
> > I'm not in favor of this solution, it should be refactored accordingly (see comments) so that common-sql stays provider agnostic which is not the case here. > > Make sure to remove the amazon code from common.sql, but keep it backward-compatible via common.compat. > > Check following PR on how to make this registration mechanismfor ObjectStorageProvider generic (which would in my PR translate to Dialect, the idea is the same): > > Added notion of dialects into ProvidersManager (#43726) > > Once the providers has support to register ObjectStorageProviders, you can finish this PR with the actual implementation of the provider specific code like AWS/Google/Azure. > > For that you can also check my second PR (as reference if needed) in which I actually added the different dialects, which is the equivalent of this PR. > > Introduce notion of dialects in DbApiHook (#41327) > > So I think best for a all would be to create a new PR (and reference this PR in it so we know they are linked) in which the registration mechanism is being done for any provider. And then finish this one by moving provider specific code to the corresponding providers once the other PR is finished/merged. > > I understand your intent, but DataFusion only requires three providers. For just these three, I wouldn’t introduce new interfaces based on dialects for each provider—it’s a relatively simple case. We can fetch the connection details using the connection ID via BaseHook. > > We started with in commonSQL, because DataFusion primarily works with SQL. Eventually, there’s a plan to move this entire part into a dedicated DataFusion provider that we have discussion with kaxil and me when we started working on common.ai provider. i also have discussion quite long back with @potiuk have this in separate provider. That’s why all DataFusion-related components can be organized under a single module. and move them .. > > Also, even if we add dailect functionality within the providers, it won’t be utilized by those providers unless it’s explicitly invoked from DataFusion. so IMHO its extra overhead for the 3 providers adding dilect interface.. II think there’s a misunderstanding of what I’m proposing. I’m **not** suggesting introducing dialects here, that doesn't even make sense. I’m referring to the **architecture and registration pattern**, and I used dialects purely as an example of how that pattern is implemented correctly today. The key point is this: - `common-sql` should remain **provider-agnostic** - It should **not import or depend on provider-specific code** (even via local imports as a workaround) - Provider-specific implementations (AWS, GCP, Azure) should live **in their respective providers**, including their tests What I’m proposing is to follow the same architectural pattern used for dialects: - `common-sql` defines the **abstraction** (`ObjectStorageProvider` already exists) - Each provider implements its own concrete version (e.g. S3ObjectStorageProvider/GCSObjectStorageProvider/AzureObjectStorageProvider/LocalObjectStorageProvider) - Each implementation **registers itself automatically** (via the ProvidersManager), without `common-sql` needing to know anything about it So just like: > `common-sql` does not know which dialects exist — it only defines the interface the same should apply here: > `common-sql` should not know which object storage providers exist — it should only define the abstraction (which we already have with `ObjectStorageProvider`) The issue with the current approach is that it introduces **implicit coupling** by pulling provider-specific logic into `common-sql` (even if indirectly), which breaks that separation. `common-sql` is suddenly coupled to all specific object storage providers, which is not clean. This isn’t about supporting “only 3 providers” — it’s about keeping the architecture **clean, extensible, and consistent with existing patterns** in Airflow. Once we have the mechanism, it easy to add a new one, as it will automatically register, and it will live in it's dedicated provider. Using a registration mechanism (like the one used for dialects) solves this cleanly without adding unnecessary complexity and coupling, while keeping responsibilities properly separated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
