[ 
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449193#comment-13449193
 ] 

Kiyan Ahmadizadeh commented on CRUNCH-57:
-----------------------------------------

Taking a look at the FlumeJava paper (this copy: 
http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf), it 
looks like the answer to this is the PObject, which acts a bit like a Future, 
although PObjects defer the start of computation until the object is accessed, 
while a Java Future begins computation as soon as the Future is constructed and 
blocks on a call to get() if the Future has yet to complete.  

It seems like the PObject concept would be generally useful.  Min and max on 
PCollection could be changed to return a PObject, as could this length method, 
etc.

Perhaps we should make another ticket for implementing PObject.  Thoughts?
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a 
> PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered 
> into another.  If I'm interested in how many elements of the original 
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the 
> number of elements in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to