I think I lean towards the collections approach, but that's probably because of my Scrunch experience. Two questions:
1) Is mapToTable necessary? I would think map(SFunction, PTableType) would be distinguishable from map(SFunction, PType) by the compiler in the same way it is for parallelDo. 2) Does the collections approach need a separate maven target at all, or could it just be part of crunch-core as a replacement for the IFn stuff? Or is there Java 8-only stuff we'll want to add in to its API? On Mon, Dec 14, 2015 at 3:13 PM, David Whiting <d...@apache.org> wrote: > Ok, so I've implemented a few iterations of this. I went forward with the > "wrap the functions" method, which seemed to work alright, but finding good > names for functions which essentially just wrap functions but which aren't > ambiguous in erasure and read nicely was a real challenge. I showed some > sample code to some of my fellow data engineers and the consensus seemed to > be that it was definitely better than anonymous inner classes, but it still > felt kind of awkward and strange to use. > > So here's a 3rd option: wrap the collection types rather than the function > types, and present an API which feels truly Java 8 native whilst still > being able to dig back to the underlying PCollections (doing pretty much > what Scrunch does, but with less implicit Scala magic). > > Here's a super-minimal proof-of-concept for that: > https://gist.github.com/DavW/7efe484ea0c00cf6e66b > > and a comparison of the two approaches in usage: > https://gist.github.com/DavW/997a92b31d55c5317fb7 > > > On 13 December 2015 at 16:14, Gabriel Reid <gabriel.r...@gmail.com> wrote: > > > This looks very cool. As long as we can keep things compatible with > > Java 7 using whatever kind of maven voodoo that's necessary, I'm all > > for it. > > > > I'd say no real reason to keep the IFn stuff if this goes in. > > > > - Gabriel > > > > On Fri, Dec 11, 2015 at 11:18 PM, Josh Wills <josh.wi...@gmail.com> > wrote: > > > It seems like a net positive over the IFn stuff, so I could make an > > > argument for replacing it, but if there's anyone out there in love > > w/IFns, > > > they should speak up now. :) > > > > > > J > > > > > > On Fri, Dec 11, 2015 at 2:17 PM, David Whiting <d...@apache.org> > wrote: > > > > > >> I *think* you can set language level and target jdk on a per-module > > basis, > > >> so it should be relatively easy. I'll experiment at some point over > the > > >> weekend. Would this complement or replace the I*Fn stuff do you think? > > 14.0 > > >> is not yet released, so I guess it's not too late to change if we want > > to. > > >> > > >> On 11 December 2015 at 22:57, Josh Wills <josh.wi...@gmail.com> > wrote: > > >> > > >> > That's the sexiest thing I've seen in some time. +1 for a lambda > > module, > > >> > but how does that work in Maven-fu? Is it like a conditional compile > > or > > >> > something? > > >> > > > >> > On Fri, Dec 11, 2015 at 1:20 PM, David Whiting <d...@apache.org> > > wrote: > > >> > > > >> > > Oops, my bad. Here's a Gist: > > >> > > https://gist.github.com/DavW/e2588e42c45ad8c06038 > > >> > > > > >> > > On 11 December 2015 at 18:43, Josh Wills <josh.wi...@gmail.com> > > wrote: > > >> > > > > >> > > > I think it's kind of awesome, but the attachment didn't go > > through- > > >> PR > > >> > or > > >> > > > gist? > > >> > > > On Fri, Dec 11, 2015 at 7:42 AM David Whiting <d...@apache.org> > > >> wrote: > > >> > > > > > >> > > > > While fixing the bug where the IFn version of mapValues on > > >> > > PGroupedTable > > >> > > > > was missing, I got thinking that this is quite an inefficient > > way > > >> of > > >> > > > > including support for lambdas and method references, and it > > still > > >> > > didn't > > >> > > > > actually support quite a few of the features that would make > it > > >> easy > > >> > to > > >> > > > > code against. > > >> > > > > > > >> > > > > Negative parts of existing lambda implementation: > > >> > > > > 1) Explosion of already-crowded PCollection, PTable and > > >> PGroupedTable > > >> > > > > interfaces, and having to implement those methods in all > > >> > > implementations. > > >> > > > > 2) Not supporting flatMap to Optional or Stream types. > > >> > > > > 3) Not exposing convenient types for reduce-type operations > > (Stream > > >> > > > > instead of Iterable, for example). > > >> > > > > > > >> > > > > Something that would solve all three of these is to build > lambda > > >> > > support > > >> > > > > as a separate artifact (so we can use all java8 types), and > > instead > > >> > of > > >> > > > the > > >> > > > > API being directly on the PSomething interfaces, we just have > > >> > > convenient > > >> > > > > ways to wrap up lambdas into DoFns or MapFns via > > >> statically-imported > > >> > > > > methods. > > >> > > > > > > >> > > > > The usage then becomes > > >> > > > > import static org.apache.crunch.Lambda.*; > > >> > > > > ... > > >> > > > > someCollection.parallelDo(flatMap(d -> someFnOf(d)), pt) > > >> > > > > ... > > >> > > > > otherGroupedTable.mapValue(reduce(seq -> seq.mapToInt(i -> > > >> i).sum()), > > >> > > > > ints()) > > >> > > > > > > >> > > > > Where flatMap and reduce are static methods on Lambda, and > > Lambda > > >> > goes > > >> > > in > > >> > > > > it's own artifact (to preserve compatibility with 6 and 7 for > > the > > >> > rest > > >> > > of > > >> > > > > Crunch). > > >> > > > > I've attached a basic proof-of-concept implementation which > I've > > >> > > tested a > > >> > > > > few things with, and I'm very happy to sketch out a more > > >> substantial > > >> > > > > implementation if people here think it's a good idea in > general. > > >> > > > > > > >> > > > > Thoughts? Ideas? Suggestions? Please tell me if this is crazy. > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >