[ 
https://issues.apache.org/jira/browse/CRUNCH-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864697#comment-13864697
 ] 

Jason Gauci commented on CRUNCH-320:
------------------------------------

When a pcollection is materialized, is the pcollection stored in RAM?  In our 
case, the size of the pcollection is prohibitively large, but if materialize() 
relies on the disk, this approach may be possible.

If I apply your patch, it will resolve all pobjects during pipeline.run()?  
That would be all I need to get around this issue.

> Materialize several PObject & PCollection objects in parallel (deferred 
> materialization)
> ----------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-320
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-320
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Gauci
>            Assignee: Josh Wills
>         Attachments: CRUNCH-320.patch
>
>
> Currently, Crunch blocks and materializes PCollections (through 
> foo.materialize()) and PObjects (through foo.getValue()) on demand, but it 
> would be a significant performance improvement if we could mark several of 
> these objects as to be materialized, and then materialize all of them in 
> parallel as part of a pipeline.run() call.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to