[ 
https://issues.apache.org/jira/browse/BEAM-9561?focusedWorklogId=510044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510044
 ]

ASF GitHub Bot logged work on BEAM-9561:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Nov/20 01:16
            Start Date: 11/Nov/20 01:16
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on a change in pull request 
#13286:
URL: https://github.com/apache/beam/pull/13286#discussion_r520979308



##########
File path: sdks/python/apache_beam/dataframe/pandas_docs_test.py
##########
@@ -33,10 +39,15 @@
 PANDAS_DIR = os.path.expanduser("~/.apache_beam/cache/pandas-" + 
PANDAS_VERSION)
 PANDAS_DOCS_SOURCE = os.path.join(PANDAS_DIR, 'doc', 'source')
 
+parallelism = None
+
 
 def main():
-  # Not available for Python 2.
-  import urllib.request
+  parser = argparse.ArgumentParser()
+  parser.add_argument('-p', '--parallel', type=int, default=0)

Review comment:
       nit: add a help string indicating the default is 0, which will use the 
cpu count

##########
File path: sdks/python/apache_beam/dataframe/pandas_docs_test.py
##########
@@ -74,22 +85,56 @@ def main():
         if any(filter in path for filter in filters):
           paths.append(path)
 
+  # Using a global here is a bit hacky, but avoids pickling issues when used
+  # with multiprocessing.
+  global parallelism

Review comment:
       Is this needed because parallelism is used in run_tests? Instead of 
branching on parallelism inside run_tests could we just make a different method 
for the parallel vs single-threaded cases?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 510044)
    Time Spent: 15.5h  (was: 15h 20m)

> Run pandas tests with Beam Dataframe API
> ----------------------------------------
>
>                 Key: BEAM-9561
>                 URL: https://issues.apache.org/jira/browse/BEAM-9561
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Robert Bradshaw
>            Priority: P2
>             Fix For: Not applicable
>
>          Time Spent: 15.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to