Hi Vinod,

first of all, welcome (back), I believe we haven't met :)

Splitting Crunch is on my agenda, too, but I haven't been able to come
up with a game plan yet (and I needed a break after all the dependency
cleanup work and the HBase split). I think it's a great idea, we should
definitely do it.

Unfortunately, it's a bit complicated because right now there are lots
of cyclic package dependencies (see [1], the picture there shows Crunch's
dependency graph). Splitting stuff into modules is going to require quite
a bit of refactoring because we have to cut dependencies.

I think we should first draw a high-level package diagram (just the top
packages) that shows which package depends on which. As per Robert C.
Martin's SOLID principles, interface packages should not depend on
implementation packages. Then we can assign the existing classes to
packages and refactor if necessary.

As an example, the "io" package looks to me like it should be an
implementation package; I'd move the interfaces (PathTarget, OutputHandler
etc.) to the client API package ("org.apache.crunch" currently) to separate
them from implementations like From, To, and At.

Regards,
  Matthias

[1] http://blog.mafr.de/2012/08/26/visualizing-package-dependencies/

On Sunday, 2012-09-09, Vinod Kumar Vavilapalli wrote:
> Hi folks,
> 
> Getting up to speed  after a long break, was off the grid.
> 
> Looking at the code, it looks to me that the api is interspersed with the 
> implementation details a bit. So, I opened 
> https://issues.apache.org/jira/browse/CRUNCH-60 and put in a proposal, please 
> let me know what you think.
> 
> This could be a little bit of intrusive change now, but I believe it would 
> help us a lot in the long run.
> 
> Thanks,
> +Vinod

Reply via email to