Lee-W commented on code in PR #36545:
URL: https://github.com/apache/airflow/pull/36545#discussion_r1483118035
##########
airflow/providers/sftp/provider.yaml:
##########
@@ -60,7 +60,7 @@ versions:
dependencies:
- apache-airflow>=2.6.0
- apache-airflow-providers-ssh>=2.1.0
- - paramiko>=2.8.0
+ - paramiko>=3.3.0
Review Comment:
May I know why do we need to update the min version here?
##########
tests/providers/google/cloud/transfers/test_sftp_to_gcs.py:
##########
@@ -252,3 +254,60 @@ def test_execute_more_than_one_wildcard_exception(self,
sftp_hook, gcs_hook):
err = ctx.value
assert "Only one wildcard '*' is allowed in source_path parameter" in
str(err)
+
+class TestSFTPToGCSOperatorStream:
+
+ def setup_method(self):
+ # setup @mock.patch
+ patcher_sftp =
mock.patch("airflow.providers.google.cloud.transfers.sftp_to_gcs.SFTPHook")
+ self.mock_sftp_hook = patcher_sftp.start()
+ patcher_gcs =
mock.patch("airflow.providers.google.cloud.transfers.sftp_to_gcs.GCSHook")
+ self.mock_gcs_hook = patcher_gcs.start()
+
+ self.task = SFTPToGCSOperator(
+ task_id="test_task",
+ source_path=SOURCE_OBJECT_NO_WILDCARD,
+ destination_bucket=TEST_BUCKET,
+ destination_path=DESTINATION_PATH_FILE,
+ use_stream=True,
+ sftp_conn_id=SFTP_CONN_ID,
+ gcp_conn_id=GCP_CONN_ID,
+ )
+
+ def teardown_method(self):
+ self.mock_sftp_hook.stop()
+ self.mock_gcs_hook.stop()
+
+ def test_stream_single_object_default_method(self):
+ # Use 'upload_from_file' method by default
+ mock_dest_blob, mock_temp_dest_blob = MagicMock(), MagicMock()
+
self.mock_gcs_hook.return_value.get_conn.return_value.bucket.return_value.blob.side_effect
= [mock_dest_blob, mock_temp_dest_blob]
+ self.task.execute(None)
+ mock_temp_dest_blob.upload_from_file.assert_called()
+
+ def test_stream_single_object_getfo_method(self):
+ mock_dest_blob, mock_temp_dest_blob = MagicMock(), MagicMock()
Review Comment:
```suggestion
mock_dest_blob, mock_temp_dest_blob = MagicMock(), MagicMock()
```
It seems they're not used here
##########
docs/apache-airflow-providers-google/operators/transfer/sftp_to_gcs.rst:
##########
@@ -93,6 +93,33 @@ and ``tests_sftp_hook_dir/subdir/parent-2.bin`` is copied to
``specific_files/pa
:start-after: [START howto_operator_sftp_to_gcs_move_specific_files]
:end-before: [END howto_operator_sftp_to_gcs_move_specific_files]
+Stream-Based File Transfer
+--------------------------------------
+
+The ``SFTPToGCSOperator`` now supports more advanced options for file
transfer,
+particularly useful for handling large files or optimizing network usage.
+These include stream-based file transfers and additional parameters for finer
control.
+
+1. Stream-Based File Transfer: When ``use_stream`` is set to ``True``,
+files are streamed directly from SFTP to GCS, without being stored temporarily
on the worker's local disk.
+This method is more efficient for large files or in environments with limited
disk space.
+
+2. Enhanced Parameters for Streaming:
+ - ``sftp_prefetch``: Optimizes file transfer speed when using the
``"getfo"`` stream method.
+ - ``stream_method``: Choose between ``"upload_from_file"`` for reliability
or ``"getfo"`` for speed and efficiency.
+ - ``max_concurrent_prefetch_requests``: Control the concurrency level for
prefetching in the ``"getfo"`` method.
+ - ``callback``: Provide a custom callback function for progress tracking
during file transfer.
+
+Example usage
+```
+ transfer_operator = SFTPToGCSOperator(
Review Comment:
https://github.com/apache/airflow/blob/4aee6da38b3140b82207dadb9c3e9cc8d8a6344c/docs/apache-airflow-providers-amazon/operators/appflow.rst?plain=1#L140
Should we try something like other operators do?
##########
licenses/LICENSES-ui.txt:
##########
Review Comment:
May I know what is this file for ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]