[ 
https://issues.apache.org/jira/browse/BEAM-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807144#comment-15807144
 ] 

ASF GitHub Bot commented on BEAM-1250:
--------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/beam/pull/1747


> Remove leaf when materializing PCollection to avoid re-evaluation.
> ------------------------------------------------------------------
>
>                 Key: BEAM-1250
>                 URL: https://issues.apache.org/jira/browse/BEAM-1250
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>
> When materializing a {{PCollection}} (implemented as {{RDD}}), to create a 
> {{PCollectionView}} for example, the runner should remove the materialized 
> {{RDD}} from the "leaves" set.
> The runner keeps track of leaves left un-handled in the DAG to force action 
> on them - {{Write}} for one is implemented via a sequence of ParDos which are 
> implemented by the runner via {{mapPartitions}} so we need to force an action.
> Materializing an {{RDD}} is done via the action {{collect()}} so no reason to 
> keep in "leaves" set.
> Currently, it remains in the "leaves" set and so it is forced and evaluates 
> the lineage and if not cached it will execute twice the lineage twice (unless 
> caches are applied for some reason).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to