[
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449507#comment-13449507
]
Gabriel Reid commented on CRUNCH-57:
------------------------------------
Clarification on my comment -- I'm in favour of implementing PObject, but like
Matthias, I'm not crazy about the idea of adding more methods to PCollection (I
have the feeling there are a few too many already). However, the advantage of
having lots of methods on PCollection makes it easy to use the Fluent [1]
interface pattern.
Going off on a tangent that's probably slightly outside the scope of this
issue, we could move towards grouping specialized functionality in PCollection
decorators that return instances of their own type (an extension of
PCollection), which could allow us to do something like this:
PCollection<Geometry> geomCollection = ...;
PCollection<Geometry> manipulatedGeom = new
GeometryOpPCollection(geomCollection)
.someGeometricOperation() // Treating this as a GeometryOpPCollection
.anotherGeometricOperation() // Again treating this as a GeometryOpPCollection
.parallelDo(new StandardFunction()); // But here it's just like a normal
PCollection
This way we can continue to support the Fluent interface use case, while not
having the PCollection interface explode with new methods.
[1] http://en.wikipedia.org/wiki/Fluent_interface
> Add a length function to PCollection
> ------------------------------------
>
> Key: CRUNCH-57
> URL: https://issues.apache.org/jira/browse/CRUNCH-57
> Project: Crunch
> Issue Type: New Feature
> Components: Core
> Affects Versions: 0.3.0
> Reporter: Kiyan Ahmadizadeh
> Assignee: Josh Wills
> Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a
> PCollection.
>
> For example, suppose there was an initial PCollection that was then filtered
> into another. If I'm interested in how many elements of the original
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the
> number of elements in the PCollection and returns the result.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira