[ 
https://issues.apache.org/jira/browse/CRUNCH-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Wills resolved CRUNCH-483.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 0.12.0

Back from vacation and slowly putting myself back to work. Thanks for this one, 
David!

> Scrunch .map does not allow mapping to a PCollection[(A,B)]
> -----------------------------------------------------------
>
>                 Key: CRUNCH-483
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-483
>             Project: Crunch
>          Issue Type: Bug
>          Components: Scrunch
>    Affects Versions: 0.11.0
>            Reporter: David Whiting
>            Priority: Minor
>             Fix For: 0.12.0
>
>         Attachments: 
> 0001-Add-asPCollection-method-to-PTable-and-corresponding.patch
>
>
> When using Scrunch PCollections and attempting to map to a pair of values, 
> the keyvalue implicit function in CanParallelDo will "upgrade" the result to 
> a PTable[K, V]. This is often the desired behaviour, but as Scrunch PTable is 
> not an extension of Scrunch PCollection, then there are cases where this is 
> not what is wanted.
> Concrete example from music land: I am trying to count the number of plays 
> for each track in each country. I want to do this:
> trackPlayedMessage(tpm => (tpm.track, tpm.country)).count()
> However because of the implicit CanParallelTransform that is substituted, I 
> cannot call .count() because what I get is a PTable and not a PCollection.
> There are a number of possible remedies that I'm happy to have a go at, but 
> I'd like some input as to which would be best:
> - Make PTable[K,V] a real extension of PCollection[(K, V)] (analagous to how 
> it works in Crunch)
> - Add an "asPCollection" method to PTable which "downgrades" the PTable[K, V] 
> to a PCollection[(K, V)].
> - Make mapToTable and flatMapToTable distinct from map and flatMap to make 
> the choice explicity (warning: breaks existing API).
> - Expose an equivalent to LowPriorityParallelTransforms.single to be invoked 
> explicitly to get a collection instead of a table using .map(fn)(implicitly, 
> single)
> - Something else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to