potiuk commented on pull request #12505:
URL: https://github.com/apache/airflow/pull/12505#issuecomment-761849270


   > @potiuk just thing to keep in mind I'm this topic of transfer operators. 
There is no s3 hook involved here. This is a pure snowflake operator. It's just 
executing a snowflake command. So in that sense, it's not properly a transfer 
operator. Yes snowflake is accessing storage but the operator knows nothing 
about this. (Snowflake manages this) So here's no reason to artificially 
restrict it's focus to s3 in the naming.
   > 
   > To be clear I'm not saying this particular or shouldn't go through. It's a 
good improvement. I'm just making a case for fixing the naming, if not now then 
later.
   
   Surely I understand that, but my line of thinking does not change here. It 
does not matter if this is a 'proper' transfer operator and whether a hook is 
used or not. I simply argue that for the user, having a separate 
"S3ToSnowflake", "GCSToSnowflake", "AzureBlobToSnowflake" is perfectly fine, 
and even intended. 
   
   Even if they will eventually re-use a lot of the same code, those can be 
implemented as 3 separate operators, with a common code reuse (either 
inheritance or composition). As a user, I'd very much prefer to have separate 
S3ToSnowlake with S3-only parameters for authentication, GCSToSnowflake with 
Google one and  AzureBlobToSnowflake with only azure authentication.
   
   Generally speaking - even if eventually 90% code under the hood will be the 
same - it can be extracted as next step when GCS/Azure operators will be added 
- without breaking backwards compatibility of the current S3ToSnowlake 
operator. The fact that snowflake allows to interact with all the three 
storages allows doing that and sharing the code (again - later). This way at 
this stage we save on system tests - the person who implements the S3 one and 
having credentials etc, would not have to create also Azure and GCS accounts to 
test if everything works fine for those. This can be done by next person who 
will have the S3 unit tests to test if the basic S3 assumptions are not broken, 
and can add GCS on top and test it thoroughly then. 
   
   I do not know exactly if there are some differences on how the GCS/S3 export 
works in this snowflake API, but I'd assume that you can still control if the 
BQ schema is created for example (this is typical use case for exporting to 
GCS). And I think having a "thin" layer of S3ToS , GCSToS .. over the native 
Snowflake capability is much closer to the philosophy of many current operators 
of Airflow. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to