[
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449193#comment-13449193
]
Kiyan Ahmadizadeh commented on CRUNCH-57:
-----------------------------------------
Taking a look at the FlumeJava paper (this copy:
http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf), it
looks like the answer to this is the PObject, which acts a bit like a Future,
although PObjects defer the start of computation until the object is accessed,
while a Java Future begins computation as soon as the Future is constructed and
blocks on a call to get() if the Future has yet to complete.
It seems like the PObject concept would be generally useful. Min and max on
PCollection could be changed to return a PObject, as could this length method,
etc.
Perhaps we should make another ticket for implementing PObject. Thoughts?
> Add a length function to PCollection
> ------------------------------------
>
> Key: CRUNCH-57
> URL: https://issues.apache.org/jira/browse/CRUNCH-57
> Project: Crunch
> Issue Type: New Feature
> Components: Core
> Affects Versions: 0.3.0
> Reporter: Kiyan Ahmadizadeh
> Assignee: Josh Wills
> Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a
> PCollection.
>
> For example, suppose there was an initial PCollection that was then filtered
> into another. If I'm interested in how many elements of the original
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the
> number of elements in the PCollection and returns the result.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira