Inlined. On Sun, Oct 28, 2012 at 2:39 AM, Matthias Friedrich <[email protected]> wrote:
> Hi, > > On Saturday, 2012-10-27, Josh Wills wrote: > > On Sat, Oct 27, 2012 at 2:41 PM, Matthias Friedrich <[email protected]> > wrote: > > [...] > >> I'd like to avoid a CRUNCH-60 type of situation where I spend lots of > >> time playing through scenarios and planning changes that, as it turns > >> out, aren't welcome. That is *very* frustrating, to say the least, so > >> let's talk ;-) > > > Yeah, that sucked, let's not do that again. > > > I think my main concern is that I don't have visibility into the scope > > of API changes that come along with CRUNCH-60, so I don't really see > > what we're building towards. I want to know what the end state of the > > API looks like w/some amount of precision-- what will live where, > > etc., etc. > > As far as I'm concerned, CRUNCH-60 is dead because we didn't want to > create any more Maven modules, which is fine by me. My goal is to > provide a clean, stable API to clients which means we have to decide > which packages are client-facing (and thus, everything in there has to > remain stable) and which packages are internal to Crunch and may be > changed at any time (giving us the opportunity to improve things). > > With "stable" I mean that we can still add and remove stuff, but only > in a way that doesn't break compatibility. We can only remove things > that have been deprecated for at least two releases. What we can't do > anymore is moving types around. > > Re. your question on precision: I don't believe in completely freezing > the API. API design is hard and we don't have Josh Bloch on the team, > so it's unlikely that we get everything right. But I do believe that > we can adhere to the defintion of stable above. > > > I agree w/Gabriel that having PType and PTypeFamily in the base > > package feels right and that having stuff that is essentially > > implementation detail stuff like Converter and OutputHandler there > > feels wrong. It may be that the value of having PType and PTypeFamily > > in base outweighs the cost of having Converter and OutputHandler there > > as well, but since I don't see the big picture here, it feels like > > we're making a series of locally optimal decisions that don't > > necessarily lead to a globally optimal API, where I'm using "optimal" > > in the sense of one that is both cleanly separated as well as easy to > > understand and use. Making a series of locally optimal decisions is > > how I ended up coding into the cul-de-sac that necessitated the > > OutputHandler abstraction in the first place. > > OK, I see the problem. As usual we want the same thing, mostly ;-) > > > If it's possible to spell out exactly what the APIs look like when > > this is done, let's do that; if we're not there yet but could spell > > out a set of principles that we would like to be true of the APIs, > > then let's do that and then start creating some strawman proposals for > > organization of code that satisfies them as closely as we can. > > Implementing the changes required to get to the goals may take longer > > than a single release cycle, but I wouldn't want it to take more than > > two releases, just because I'm loathe to be in a process of > > continually moving things around. > > Good idea, let's first agree on a set of principles. In my opinion, > we should limit the scope for these prinicples to client-facing > packages, everything else can be changed in any way at any time. > > My proposal is based on [2], a very short and incomplete summary can > be found at [3]. For us, it boils down to this: > > * A package must have a clear purpose; it contains either mostly > abstractions or mostly implementations (this makes it easier > to explain) > * A package must not depend on a package that is less stable > than itself (meaning a package containing mostly abstractions > must not depend on one containing mostly implementations) > * There must be no dependencies from a client-facing package to > an internal package (that is, javadocs don't have dangling > references) > * There must be tight cohesion between classes in a package or > the package should be split (this doesn't apply for .util) > * There must be no dependency cycles between client-facing packages > I agree with these principles, although I think that the first one (clear purpose for a package) is often in conflict with the last one (dependency cycles between client facing packages). Is there an implicit priority scheme here? We're saying that having clear purpose for a package is more important than having dependency cycles, or are we saying that the two are equal? > > You can calculate metrics for all of this but it's really just common > sense. Crunch follows these rules in the vast majority of cases > already. Right now I see the following violations: > > * The .types package mixes abstractions and implementations and > is part of a dependency cycle with base. > * The base package references the .io implementation package > causing a dependency cycle. > * The base package references the .util package causing a > dependency cycle. > * There are lots of implementations in CombineFn and other Fns > that shouldn't be in base (which is for abstractions). We should > move them to .fn, perhaps to Guava style CombineFns, FilterFns. > We can even do this in a backwards compatible way. > So of these, I think that the CombineFn -> CombineFns change is the easiest fix, in that it solves the implementation issue for CombineFn and the dependency of the base API on the util package. I am 100% behind that one. Sorting out the cycle between io/types/base seems trickier to me and I think that is the core of the design problem, and it goes right into the tradeoff between clear purpose for a package and the dependency cycles between client facing packages. Do you agree? > > Given that Crunch is a framework this is not really ambitious. My > teams at work follow a lot more design rules even for applications. > One example: We make everything final by default; extension takes > place either by composition or by template method which makes it > possible to evolve classes without breaking client code. That's > what the DesignForExtension Checkstyle rule mandates (we disabled > it for Crunch). > > Note: Java 8 will be a game changer; as far as I understand it, our > abstract Fns are *not* going to work with Lambdas, we would need > interfaces with only a single method. Does anyone think we have to > address this before Java 8 is released? > > > I think that once we have a proposal we agree on, we should cut the > > 0.4.0 release and then devote the next release cycle to implementing > > the new API for 0.5.0. > > Agreed. BTW, I have verified our release process (with a local Nexus > and all), we're ready to release at any time. > > So, what do you think of the principles outlined above? > > Regards, > Matthias > > [1] > http://tmp.mafr.de/crunch/apidocs/0.4.0-incubating-SNAPSHOT/overview-summary.html > [2] Robert C. Martin, "Agile Principles, Patterns, and Practices", > chapter "Principles of Package and Component Design". > [3] http://en.wikipedia.org/wiki/Package_Principles > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
