jason810496 commented on code in PR #59753:
URL: https://github.com/apache/airflow/pull/59753#discussion_r2646493573
##########
providers/google/src/airflow/providers/google/cloud/log/gcs_task_handler.py:
##########
@@ -164,18 +183,29 @@ def read(self, relative_path: str, ti: RuntimeTI) ->
tuple[LogSourceInfo, LogMes
else:
messages.extend(["Found remote logs:", *[f" * {x}" for x in
sorted(uris)]])
else:
- return messages, None
+ return messages, []
try:
for key in sorted(uris):
blob = storage.Blob.from_string(key, self.client)
- remote_log = blob.download_as_bytes().decode()
- if remote_log:
- logs.append(remote_log)
+ stream = blob.open("r")
Review Comment:
Thanks for the catch!
I though the `blob` object don't support `with open...`, but it turn out
that I'm wrong.
https://github.com/googleapis/python-storage/blob/main/samples/snippets/storage_fileio_write_read.py
However, if we are using context manager like following:
```python
def _get_log_stream(self, blob: storage.Blob) -> RawLogStream:
"""
Yield lines from the given GCS blob.
:param blob: GCS Blob object
"""
try:
with blob.open("r") as stream:
yield from stream
except Exception as e:
self.log.warning("Error reading remote log: %s", e)
raise e
```
If the line `with blob.open` raise exception, it will not raise exception
until we really exhaust from generator ( the `"".join(log_stream)` call ).
Instead of raising exception early at try/catch block of `stream` method.
```python
providers/google/tests/unit/google/cloud/log/test_gcs_task_handler.py:281:
in test_stream_and_read_methods
assert "".join(log_stream) == expected_log
providers/google/src/airflow/providers/google/cloud/log/gcs_task_handler.py:207:
in _get_log_stream
raise e
providers/google/src/airflow/providers/google/cloud/log/gcs_task_handler.py:203:
in _get_log_stream
with blob.open("r") as stream:
/usr/python/lib/python3.10/unittest/mock.py:1114: in __call__
return self._mock_call(*args, **kwargs)
/usr/python/lib/python3.10/unittest/mock.py:1118: in _mock_call
return self._execute_mock_call(*args, **kwargs)
/usr/python/lib/python3.10/unittest/mock.py:1173: in _execute_mock_call
raise effect
E Exception: Read failed
```
The trade-off between directly open and context manger approaches
- directly open: User are able to see `Unable to read remote log {e}` log on
TaskInstanceLog page, but we might not cleanup the blob resource properly
- context manger: User are not able to see `Unable to read remote log {e}`
log and will cause Internal Error of TaskInstance Log page, but we could ensure
the blob resource will be cleanup
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]