On Thu, Feb 20, 2014 at 4:06 PM, Jinal Shah <[email protected]> wrote: > Thanks Gabriel that works. Just curious what's the benefit of using > PCollection.by as oppose to PCollection.parallelDo ? In which use case is > either better than the other.
PCollection.by is just a convenience method that allows you to create a table by only specifying how to create the keys and the PType of the keys. If you wanted to create a PTable using PCollection.parallelDo, you would need to define the full table type and define a DoFn or MapFn that creates pairs of the key and value. > > > On Thu, Feb 20, 2014 at 6:01 AM, Gabriel Reid <[email protected]>wrote: > >> On Thu, Feb 20, 2014 at 11:59 AM, Jinal Shah <[email protected]> >> wrote: >> > Somewhat like that as we are also using that same approach but I was more >> > thinking of it as >> > PTables.asPTable(PCollection<V>, Keyfinder<V>, PType<K>) and return as >> > PTable<K,V> >> > >> > Basically >> > KeyFinder<V> is an interface which will have somekind of method like >> > findKey(V) returning K from that V or calculated or anyway it wants. >> > >> >> This is pretty much exactly what PCollection#by does. Your proposed >> method as you described it would be written as follows using >> PCollection#by: >> >> PCollection<V> collection = ...; >> PTable<K, V> table = collection.by(new KeyFinderMapFn(), ptypeForKey); >> >> >> The method is described at >> >> http://crunch.apache.org/apidocs/0.8.2/org/apache/crunch/PCollection.html#by(org.apache.crunch.MapFn,%20org.apache.crunch.types.PType) >> >> - Gabriel >> >> >> >> > >> > >> > On Thu, Feb 20, 2014 at 12:07 AM, Gabriel Reid <[email protected] >> >wrote: >> > >> >> >> >> >> >> > On 20 Feb 2014, at 05:11, Jinal Shah <[email protected]> wrote: >> >> > >> >> > I didn't knew that, but I was more talking about something like this >> >> > PCollection<V> to PTable<K,V> basically. >> >> > >> >> >> >> I think what you want is the PCollection#by method. It takes a MapFn >> that >> >> maps each value V to a key, and returns a PTable<K,V> >> >> >> >> - Gabriel >> >> >> >> > >> >> > >> >> >> On Wed, Feb 19, 2014 at 5:49 PM, Josh Wills <[email protected]> >> >> wrote: >> >> >> >> >> >> org.apache.crunch.lib.PTables.asPTable is likely what you want. >> >> >> >> >> >> >> >> >> On Wed, Feb 19, 2014 at 3:47 PM, Jinal Shah <[email protected] >> > >> >> >> wrote: >> >> >> >> >> >>> Hi everyone, >> >> >>> >> >> >>> Is there a generic way of converting PCollection to PTable? If not, >> Can >> >> >> we >> >> >>> create a generic class? Because we are having lot of places where we >> >> want >> >> >>> to perform a join on 2 PCollections so we have to convert it into >> >> PTables >> >> >>> and then do a join and then convert it into a PCollection. So i was >> >> >>> wondering is there a better way of doing this. >> >> >>> >> >> >>> Thanks >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Director of Data Science >> >> >> Cloudera <http://www.cloudera.com> >> >> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> >> >> >> >> >>
