[ 
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449678#comment-13449678
 ] 

Josh Wills commented on CRUNCH-57:
----------------------------------

I'm +1 for PObject and I like Gabriel's approach re: domain-specific fluent 
wrappers. At the goog, the FJ PCollection/PTable interfaces were *huge*, but it 
made writing fluent pipelines really easy. My initial idea was to put all of 
the MR patterns into the lib.* module, but it ended up being annoying to work 
with in practice, IMO.

Kiyan, do you have an opinion on how you want to go about this one? Do you want 
to take on defining PObject (which in my mind, could just be a simple wrapper 
that materialized a PCollection and then implemented some abstract function 
that did a computation on the materialized Iterable) and incorporate it here? 
Or do you just want to keep the Aggregate.length impl for now and leave off the 
rest of the change until we create PObject?
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a 
> PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered 
> into another.  If I'm interested in how many elements of the original 
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the 
> number of elements in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to