[ 
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=124005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-124005
 ]

ASF GitHub Bot logged work on BEAM-4391:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Jul/18 07:58
            Start Date: 17/Jul/18 07:58
    Worklog Time Spent: 10m 
      Work Description: javdrher commented on issue #5736: [BEAM-4391] Example 
of distributed optimization
URL: https://github.com/apache/beam/pull/5736#issuecomment-405494691
 
 
   Sorry for the delay, I was attending a conference last week.
   
   @aaltay I addressed most comments I think, regarding the testing: I'm 
guessing numpy and scipy will not be added to the testing environment? From my 
side I can not easily drop the dependency from the code unfortunately. What is 
the recommended road to resolve this? Bringing the imports inline to the 
functions and mock them during testing?
   
   Regarding the fusion: the grid generation is a prime example of an operation 
with a PCollection low number of elements (actually only 1) and a large output 
PCollection. The effect on the execution of the pipeline is exactly as 
described 
[here](https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization),
 hence the transform to prevent fusion. 
   
   In fact, because the step from 1 element to a huge number of elements was 
itself slow, I create the grid in two stages: the first step takes two 
parameters and forms a small grid only with the options for that parameter, the 
second step then takes the small grid and extends it based on the options for 
the remaining parameters. That way, some parallelism is involved in the 
construction of the grid itself. (Actually, I think you could generalize this 
approach and implement the entire generation recursively into the pipeline but 
that seemed to be a bit out of scope and perhaps a bit abusive?). 
   
   To encapsulate the grid creation process, I put it in a PTransform. I did 
the same for the optimization part of the pipeline (The PTransforms represent 
two phases: generate all options, optimize all options). Were you suggesting I 
drop the PTransforms and put the construction of the pipeline in one function? 
If so, I'll adjust the code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 124005)
    Time Spent: 2h 20m  (was: 2h 10m)

> Example of distributed optimization
> -----------------------------------
>
>                 Key: BEAM-4391
>                 URL: https://issues.apache.org/jira/browse/BEAM-4391
>             Project: Beam
>          Issue Type: New Feature
>          Components: examples-python
>            Reporter: Joachim van der Herten
>            Assignee: Joachim van der Herten
>            Priority: Minor
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently, we are writing a blogpost on using the Beam Python SDK for solving 
> distributed optimization tasks. It will include an example of a optimization 
> problem with both discrete and continuous parameters, which is then solved 
> using Apache Beam. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to