potiuk commented on pull request #12505: URL: https://github.com/apache/airflow/pull/12505#issuecomment-762086664
@dstandish Actually I have no problem with those "generalization" being regular (non-transfer) operators. This is perfectly fine and we already have such operators. Look here: https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/cloud_storage_transfer_service.py This is the "embodiment" of what I was talking about. We have `TransferJobPreprocessor` class and `CloudDataTransferServiceHoook` and they serve similar purpose - bring data from "somewhere" to `GCS`. This "Processor" and "Hook" classes are used by composition in few operators: * CloudDataTransferServiceGCSToGCSOperator * CloudDataTransferServiceS3ToGCSOperator What those operators do, they merely pass the credentials and fire the job that transfer the data. But it is a bit more complex because it is an 'asynchronous' operator - you can fire the job and monitor it's output/cancel if needed. It would be great if we actually also have similar approach in the "generalization" of the "Load" operator possibly (not necessarily, but would be nice). As i understand it now - we are not actually "running the transfer ourselves" and we are not even passing the AWS credentials to snowflake. Am I correct? What got me conused is that in the SnowflakeHook there is this method: ``` def _get_aws_credentials(self) -> Tuple[Optional[Any], Optional[Any]]: """ Returns aws_access_key_id, aws_secret_access_key from extra intended to be used by external import and export statements """ ``` Which is - apparently not used and not needed - it retrieves AWS credentials from extras of the Snowflake(!) connection it seems - which is rather cumbersome. So I'd rather fix it as well, remove the method and replace it with the actual Airflow AWS connection if we needed it at all. But I understand from your explanation that this LOAD method simply takes the file ids of whatever cloud storage we provide, and Snowflake already has to have the right credentials configured to be able to read those. Am I right? If so, this is not a "Transfer" operator at all. It's a regular Operator that fires external job and waits for it :). The `SnowflakeLoadOperator` is a good name BTW. ! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
