TheNeuralBit commented on code in PR #23546:
URL: https://github.com/apache/beam/pull/23546#discussion_r996035863


##########
sdks/python/apache_beam/examples/complete/distribopt_test.py:
##########
@@ -20,17 +20,38 @@
 # pytype: skip-file
 
 import logging
-import os
-import tempfile
 import unittest
+import uuid
 from ast import literal_eval as make_tuple
 
 import numpy as np
 import pytest
 from mock import MagicMock
 from mock import patch
 
-from apache_beam.testing.util import open_shards
+from apache_beam.testing.test_pipeline import TestPipeline
+
+# Protect against environments where gcsio library is not available.
+try:
+  from apache_beam.io.gcp import gcsio
+except ImportError:
+  gcsio = None
+
+
+def read_gcs_output_file(file_pattern):
+  gcs = gcsio.GcsIO()
+  file_names = gcs.list_prefix(file_pattern).keys()
+  output = []
+  for file_name in file_names:
+    output.append(gcs.open(file_name).read().decode('utf-8').strip())
+  return '\n'.join(output)
+
+
+def create_content_input_file(path, contents):
+  logging.info('Creating file: %s', path)
+  gcs = gcsio.GcsIO()
+  with gcs.open(path, 'w') as f:
+    f.write(str.encode(contents, 'utf-8'))

Review Comment:
   nit: I think it would be better if these utilities used the Filesystems API, 
see here for an example: 
https://github.com/apache/beam/blob/45cc0854be553b5138753cf649fa863a80ff2b04/sdks/python/apache_beam/examples/dataframe/taxiride_it_test.py#L93-L100
   
   It would also be good to extract these out into testing.utils rather than 
copying them.



##########
sdks/python/apache_beam/examples/complete/tfidf.py:
##########
@@ -35,6 +34,12 @@
 from apache_beam.options.pipeline_options import SetupOptions
 from apache_beam.pvalue import AsSingleton
 
+# Protect against environments where gcsio library is not available.
+try:
+  from apache_beam.io.gcp import gcsio
+except ImportError:
+  gcsio = None

Review Comment:
   Does this actually protect us? It looks like this would just change the 
error to a more confusing one: `None has no attribure GcsIO`, when gcsio is 
used. I think just letting the ImportError raise would be preferable.
   
   Alternatively we could add a `skipIf(gcsio is None)`, but that might lead to 
us unintentionally skipping it indefinitely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to