[
https://issues.apache.org/jira/browse/BEAM-4391?focusedWorklogId=124005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-124005
]
ASF GitHub Bot logged work on BEAM-4391:
----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Jul/18 07:58
Start Date: 17/Jul/18 07:58
Worklog Time Spent: 10m
Work Description: javdrher commented on issue #5736: [BEAM-4391] Example
of distributed optimization
URL: https://github.com/apache/beam/pull/5736#issuecomment-405494691
Sorry for the delay, I was attending a conference last week.
@aaltay I addressed most comments I think, regarding the testing: I'm
guessing numpy and scipy will not be added to the testing environment? From my
side I can not easily drop the dependency from the code unfortunately. What is
the recommended road to resolve this? Bringing the imports inline to the
functions and mock them during testing?
Regarding the fusion: the grid generation is a prime example of an operation
with a PCollection low number of elements (actually only 1) and a large output
PCollection. The effect on the execution of the pipeline is exactly as
described
[here](https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization),
hence the transform to prevent fusion.
In fact, because the step from 1 element to a huge number of elements was
itself slow, I create the grid in two stages: the first step takes two
parameters and forms a small grid only with the options for that parameter, the
second step then takes the small grid and extends it based on the options for
the remaining parameters. That way, some parallelism is involved in the
construction of the grid itself. (Actually, I think you could generalize this
approach and implement the entire generation recursively into the pipeline but
that seemed to be a bit out of scope and perhaps a bit abusive?).
To encapsulate the grid creation process, I put it in a PTransform. I did
the same for the optimization part of the pipeline (The PTransforms represent
two phases: generate all options, optimize all options). Were you suggesting I
drop the PTransforms and put the construction of the pipeline in one function?
If so, I'll adjust the code.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 124005)
Time Spent: 2h 20m (was: 2h 10m)
> Example of distributed optimization
> -----------------------------------
>
> Key: BEAM-4391
> URL: https://issues.apache.org/jira/browse/BEAM-4391
> Project: Beam
> Issue Type: New Feature
> Components: examples-python
> Reporter: Joachim van der Herten
> Assignee: Joachim van der Herten
> Priority: Minor
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> Currently, we are writing a blogpost on using the Beam Python SDK for solving
> distributed optimization tasks. It will include an example of a optimization
> problem with both discrete and continuous parameters, which is then solved
> using Apache Beam.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)