[GitHub] [beam] TheNeuralBit commented on a change in pull request #13286: [BEAM-9561] Run pandas doctests in parallel.

GitBox Tue, 10 Nov 2020 17:16:54 -0800


TheNeuralBit commented on a change in pull request #13286:
URL: https://github.com/apache/beam/pull/13286#discussion_r520979308




##########
File path: sdks/python/apache_beam/dataframe/pandas_docs_test.py
##########
@@ -33,10 +39,15 @@
 PANDAS_DIR = os.path.expanduser("~/.apache_beam/cache/pandas-" + 
PANDAS_VERSION)
 PANDAS_DOCS_SOURCE = os.path.join(PANDAS_DIR, 'doc', 'source')
 
+parallelism = None
+
 
 def main():
-  # Not available for Python 2.
-  import urllib.request
+  parser = argparse.ArgumentParser()
+  parser.add_argument('-p', '--parallel', type=int, default=0)

Review comment:
       nit: add a help string indicating the default is 0, which will use the 
cpu count

##########
File path: sdks/python/apache_beam/dataframe/pandas_docs_test.py
##########
@@ -74,22 +85,56 @@ def main():
         if any(filter in path for filter in filters):
           paths.append(path)
 
+  # Using a global here is a bit hacky, but avoids pickling issues when used
+  # with multiprocessing.
+  global parallelism

Review comment:
       Is this needed because parallelism is used in run_tests? Instead of 
branching on parallelism inside run_tests could we just make a different method 
for the parallel vs single-threaded cases?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] TheNeuralBit commented on a change in pull request #13286: [BEAM-9561] Run pandas doctests in parallel.

Reply via email to