jason810496 commented on code in PR #59753:
URL: https://github.com/apache/airflow/pull/59753#discussion_r2646493573


##########
providers/google/src/airflow/providers/google/cloud/log/gcs_task_handler.py:
##########
@@ -164,18 +183,29 @@ def read(self, relative_path: str, ti: RuntimeTI) -> 
tuple[LogSourceInfo, LogMes
             else:
                 messages.extend(["Found remote logs:", *[f"  * {x}" for x in 
sorted(uris)]])
         else:
-            return messages, None
+            return messages, []
 
         try:
             for key in sorted(uris):
                 blob = storage.Blob.from_string(key, self.client)
-                remote_log = blob.download_as_bytes().decode()
-                if remote_log:
-                    logs.append(remote_log)
+                stream = blob.open("r")

Review Comment:
   Thanks for the catch!
   I though the `blob` object don't support `with open...`, but it turn out 
that I'm wrong.
   
   
https://github.com/googleapis/python-storage/blob/main/samples/snippets/storage_fileio_write_read.py
   
   
   However, if we are using context manager like following:
   ```python
   def _get_log_stream(self, blob: storage.Blob) -> RawLogStream:
           """
           Yield lines from the given GCS blob.
           :param blob: GCS Blob object
           """
           try:
               with blob.open("r") as stream:
                   yield from stream
           except Exception as e:
               self.log.warning("Error reading remote log: %s", e)
               raise e
   ```
   
   If the line `with blob.open` raise exception, it will not raise exception 
until we really exhaust from generator ( the `"".join(log_stream)` call ). 
Instead of raising exception early at try/catch block of `stream` method.
   
   ```python
   providers/google/tests/unit/google/cloud/log/test_gcs_task_handler.py:281: 
in test_stream_and_read_methods
       assert "".join(log_stream) == expected_log
   
providers/google/src/airflow/providers/google/cloud/log/gcs_task_handler.py:207:
 in _get_log_stream
       raise e
   
providers/google/src/airflow/providers/google/cloud/log/gcs_task_handler.py:203:
 in _get_log_stream
       with blob.open("r") as stream:
   /usr/python/lib/python3.10/unittest/mock.py:1114: in __call__
       return self._mock_call(*args, **kwargs)
   /usr/python/lib/python3.10/unittest/mock.py:1118: in _mock_call
       return self._execute_mock_call(*args, **kwargs)
   /usr/python/lib/python3.10/unittest/mock.py:1173: in _execute_mock_call
       raise effect
   E   Exception: Read failed
   ```
   
   The trade-off between directly open and context manger approaches
   - directly open: User are able to see `Unable to read remote log {e}` log on 
TaskInstanceLog page, but we might not cleanup the blob resource properly
   - context manger: User are not able to see `Unable to read remote log {e}` 
log and will cause Internal Error of TaskInstance Log page, but we could ensure 
the blob resource will be cleanup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to