[ 
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449999#comment-13449999
 ] 

Kiyan Ahmadizadeh commented on CRUNCH-57:
-----------------------------------------

I'm up for taking on an implementation of PObject and incorporating it into 
this change.  I've created a ticket CRUNCH-58 for this.  Josh, please check 
that ticket for some discussion on the implementation of PObject.  

+1 For using decorators to achieve the Fluent pattern without crowding the 
methods in the PCollection interface.  This would work well in Java and Scala. 
I think Gabriel's geometry example highlights the issue that you may want 
special operations on PCollections holding objects of a specific type.  Another 
example would be PCollections of numeric data.  It would make sense for such 
collections to have special operations like average, sum, etc.  

-1 On not including length() in the base PCollection interface, however.  I 
think decorators are great for the case outlined above, where the functionality 
applies only to PCollections holding objects of a specific type.  Counting the 
number of elements in a PCollection, however, is applicable to all PCollection 
regardless of the type of object it contains.  I think operations that can 
apply to any and all PCollections belong in the PCollection interface, and 
operations applicable to a specific kind of PCollection belong in decorators.  
For this reason I argue that length() goes in the PCollection interface.  
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a 
> PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered 
> into another.  If I'm interested in how many elements of the original 
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the 
> number of elements in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to