[ https://issues.apache.org/jira/browse/CRUNCH-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Whiting updated CRUNCH-483: --------------------------------- Attachment: 0001-Add-asPCollection-method-to-PTable-and-corresponding.patch Attached patch for the "second best" option, as making PTable a PCollection is indeed problematic. > Scrunch .map does not allow mapping to a PCollection[(A,B)] > ----------------------------------------------------------- > > Key: CRUNCH-483 > URL: https://issues.apache.org/jira/browse/CRUNCH-483 > Project: Crunch > Issue Type: Bug > Components: Scrunch > Affects Versions: 0.11.0 > Reporter: David Whiting > Priority: Minor > Attachments: > 0001-Add-asPCollection-method-to-PTable-and-corresponding.patch > > > When using Scrunch PCollections and attempting to map to a pair of values, > the keyvalue implicit function in CanParallelDo will "upgrade" the result to > a PTable[K, V]. This is often the desired behaviour, but as Scrunch PTable is > not an extension of Scrunch PCollection, then there are cases where this is > not what is wanted. > Concrete example from music land: I am trying to count the number of plays > for each track in each country. I want to do this: > trackPlayedMessage(tpm => (tpm.track, tpm.country)).count() > However because of the implicit CanParallelTransform that is substituted, I > cannot call .count() because what I get is a PTable and not a PCollection. > There are a number of possible remedies that I'm happy to have a go at, but > I'd like some input as to which would be best: > - Make PTable[K,V] a real extension of PCollection[(K, V)] (analagous to how > it works in Crunch) > - Add an "asPCollection" method to PTable which "downgrades" the PTable[K, V] > to a PCollection[(K, V)]. > - Make mapToTable and flatMapToTable distinct from map and flatMap to make > the choice explicity (warning: breaks existing API). > - Expose an equivalent to LowPriorityParallelTransforms.single to be invoked > explicitly to get a collection instead of a table using .map(fn)(implicitly, > single) > - Something else -- This message was sent by Atlassian JIRA (v6.3.4#6332)