TheNeuralBit commented on code in PR #23546:
URL: https://github.com/apache/beam/pull/23546#discussion_r996035863
##########
sdks/python/apache_beam/examples/complete/distribopt_test.py:
##########
@@ -20,17 +20,38 @@
# pytype: skip-file
import logging
-import os
-import tempfile
import unittest
+import uuid
from ast import literal_eval as make_tuple
import numpy as np
import pytest
from mock import MagicMock
from mock import patch
-from apache_beam.testing.util import open_shards
+from apache_beam.testing.test_pipeline import TestPipeline
+
+# Protect against environments where gcsio library is not available.
+try:
+ from apache_beam.io.gcp import gcsio
+except ImportError:
+ gcsio = None
+
+
+def read_gcs_output_file(file_pattern):
+ gcs = gcsio.GcsIO()
+ file_names = gcs.list_prefix(file_pattern).keys()
+ output = []
+ for file_name in file_names:
+ output.append(gcs.open(file_name).read().decode('utf-8').strip())
+ return '\n'.join(output)
+
+
+def create_content_input_file(path, contents):
+ logging.info('Creating file: %s', path)
+ gcs = gcsio.GcsIO()
+ with gcs.open(path, 'w') as f:
+ f.write(str.encode(contents, 'utf-8'))
Review Comment:
nit: I think it would be better if these utilities used the Filesystems API,
see here for an example:
https://github.com/apache/beam/blob/45cc0854be553b5138753cf649fa863a80ff2b04/sdks/python/apache_beam/examples/dataframe/taxiride_it_test.py#L93-L100
It would also be good to extract these out into testing.utils rather than
copying them.
##########
sdks/python/apache_beam/examples/complete/tfidf.py:
##########
@@ -35,6 +34,12 @@
from apache_beam.options.pipeline_options import SetupOptions
from apache_beam.pvalue import AsSingleton
+# Protect against environments where gcsio library is not available.
+try:
+ from apache_beam.io.gcp import gcsio
+except ImportError:
+ gcsio = None
Review Comment:
Does this actually protect us? It looks like this would just change the
error to a more confusing one: `None has no attribure GcsIO`, when gcsio is
used. I think just letting the ImportError raise would be preferable.
Alternatively we could add a `skipIf(gcsio is None)`, but that might lead to
us unintentionally skipping it indefinitely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]