[ 
https://issues.apache.org/jira/browse/BEAM-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805923#comment-15805923
 ] 

ASF GitHub Bot commented on BEAM-1250:
--------------------------------------

GitHub user amitsela opened a pull request:

    https://github.com/apache/beam/pull/1747

    [BEAM-1250] Remove leaf when materializing PCollection to avoid re-ev…

    …aluation.
    
    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/amitsela/beam remove-leaf-getvalues

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/1747.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1747
    
----
commit 7e7715035c870c28f4294fe52a0cc7c5d838aee2
Author: Sela <[email protected]>
Date:   2017-01-06T22:03:34Z

    [BEAM-1250] Remove leaf when materializing PCollection to avoid 
re-evaluation.

----


> Remove leaf when materializing PCollection to avoid re-evaluation.
> ------------------------------------------------------------------
>
>                 Key: BEAM-1250
>                 URL: https://issues.apache.org/jira/browse/BEAM-1250
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>
> When materializing a {{PCollection}} (implemented as {{RDD}}), to create a 
> {{PCollectionView}} for example, the runner should remove the materialized 
> {{RDD}} from the "leaves" set.
> The runner keeps track of leaves left un-handled in the DAG to force action 
> on them - {{Write}} for one is implemented via a sequence of ParDos which are 
> implemented by the runner via {{mapPartitions}} so we need to force an action.
> Materializing an {{RDD}} is done via the action {{collect()}} so no reason to 
> keep in "leaves" set.
> Currently, it remains in the "leaves" set and so it is forced and evaluates 
> the lineage and if not cached it will execute twice the lineage twice (unless 
> caches are applied for some reason).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to