potiuk commented on pull request #12505:
URL: https://github.com/apache/airflow/pull/12505#issuecomment-762086664


   @dstandish  Actually I have no problem with those "generalization" being 
regular (non-transfer) operators. This is perfectly fine and we already have 
such operators. 
   
   Look here: 
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/cloud_storage_transfer_service.py
   
   This is the "embodiment" of what I was talking about. We have 
`TransferJobPreprocessor` class and `CloudDataTransferServiceHoook` and they 
serve similar purpose - bring data from "somewhere" to `GCS`. This "Processor" 
and "Hook" classes are used by composition in few operators:
   
   * CloudDataTransferServiceGCSToGCSOperator
   * CloudDataTransferServiceS3ToGCSOperator
   
   What those operators do, they merely pass the credentials and fire the job 
that transfer the data.
   
   But it is a bit more complex because it is an 'asynchronous' operator - you 
can fire the job and monitor it's output/cancel if needed. It would be great if 
we actually also have similar approach in the "generalization" of the "Load" 
operator possibly (not necessarily, but would be nice). 
   
   As i understand it now - we are not actually "running the transfer 
ourselves" and we are not even passing the AWS credentials to snowflake. Am I 
correct? What got me conused is that in the SnowflakeHook there is this method:
   
   ```
       def _get_aws_credentials(self) -> Tuple[Optional[Any], Optional[Any]]:
           """
           Returns aws_access_key_id, aws_secret_access_key
           from extra
   
           intended to be used by external import and export statements
           """
   ```
   Which is - apparently not used and not needed - it retrieves AWS credentials 
from extras of the Snowflake(!) connection it seems - which is rather 
cumbersome. So I'd rather fix it as well, remove the method and replace it with 
the actual Airflow AWS connection if we needed it at all. But I understand from 
your explanation that this LOAD method simply takes the file ids of whatever 
cloud storage we provide, and Snowflake already has to have the right 
credentials configured to be able to read those.
   
   Am I right?  If so, this is not a "Transfer" operator at all.  It's a 
regular Operator that fires external job and waits for it :).
   
   The `SnowflakeLoadOperator` is a good name BTW. !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to