Hello, To start with a disclaimer: I'm not a cTAKES committer but I do use (and modify) cTAKES quite a bit. So, my $0.02:
> +1 for multiple jars. I've found it to be very problematic that there is a single cTAKES.jar with all code and dependencies kludged in - this makes it very hard to use later versions of a library cTAKES is using, even if it is backwards compatible. Case in point: LVG. The HSqlDB jars from LVG 2008 are in the cTAKES jar, and this version of HSqlDB is not compatible with LVG 2011. I could put the LVG 2011 libraries in front of cTAKES.jar in the classpath, but that is messy and might cause other problems > -1 for separate top-level src/ test/ example/ and resource/ directories. The scr/main/java and src/main/cpp structure is the default maven structure, and I believe apache is using maven for build. Yes you can override this in your maven config, but it is a (little) pain. I'm neither a big fan of this structure nor of maven, but I would suck it up. Best, VJ On Wed, Jul 18, 2012 at 6:01 PM, Mattmann, Chris A (388J) < [email protected]> wrote: > Agree with Sean's assessment below on all points. > > Cheers, > Chris > > On Jul 17, 2012, at 10:03 AM, Finan, Sean wrote: > > > +1 for a single trunk. > > > > In my experience, even if the app is oriented around services and/or > modules planned point releases of individual products in a single trunk > does not pose a problem, as you can make a branch of the whole trunk, then > let those products be developed on that branch where other product source > etc. is static (or hopefully vice-versa). This was useful in one case > where we had code for a database that evolved much more slowly than other > dependent products. While it didn't much matter to developers, according > to our CM keeping everything in one trunk made efforts easier on their > side. I took them at their word. Please note that I am not saying that we > should or will need to have separate product releases, just that I don't > think a single trunk should prevent us from doing so. > > > > +1 for multiple jars. > > The matter of single jar vs. multiple jars is not necessarily connected > to having a single or multiple trunks. > > I think that separate projects should have separate jar files. This way > developers who focus on a single project just need to check out their > project's source and jars for each dependency. Integration should build > each project in a top-down fashion and if a certain project doesn't > test-out or build properly then it doesn't get a (new) published jar. This > keeps everybody dependent upon that project from being held up the next day > with a broken build as they can check out the published jar without really > worrying about whether it is truly new or not, it is a working version. It > goes along with the notion of "always shippable", one of those agility > things. > > > > +1 for separate top-level src/ test/ example/ and resource/ > directories. > > This question was not explicitly mentioned in this topic, but it does > have something to do with overall structure and jars (Pei does have src/ > and resource/ in his post). I like the idea of having one root directory > (under each project) for source, one for tests, and one for examples. All > directories share the same package structure. I have a few reasons for > doing this. The test/ directory keeps my src/ directory from getting > cluttered with files that are tests and not source, which makes browsing > (in and out of IDE) faster. For that matter, it makes for a smaller and > simpler the source tree than having test/ subdirectories (which seems to be > a common practice) all over the place. The example/ directory also keeps > source directories from becoming cluttered, and for anybody new to the code > base it can make finding decent examples for what they want easier and > faster. In addition, it keeps the source code from having long main() > methods (which also seems to be a common practice) and other methods that > are necessary for examples but not the purpose of the class. Having > examples in an example/ directory also makes it obvious to a new developer > that they are examples and not old (non junit) tests (which, btw we need to > extract). I also have a separate resource/ root directory (such as in the > original post), which reduces clutter and makes browsing easier etc. > Another thing that these separate root directories make possible is > lighter jar files. One can build and test everything, but publish a jar > with just the src/. That makes dependency updates faster for people that > don't need the code. cTakes isn't that big, but it is something to keep in > mind. A very minor point is that people should regularly be checking in > tests (and a few examples). With all the code in one src/ root, it is > difficult to notice whether or not somebody is being responsible in this > regard. However, it is very simple to survey at a glance root directories > with large checkins and see if anything is in test/. If there are a dozen > new classes checked into src/ and nothing into test/ then the committer > might need a friendly reminder to write tests for the new code. For that > matter, if a project starts to look src/ heavy and test/ light (easy to > see), then we can try to schedule a test-writing iteration. Once again > I'll agree that writing tests can be a pain. However, it does make things > easier in the long run, especially in projects with multiple developers who > come and go. One last note on this is that sometimes there is a structure > such as src/main/ & src/test/. I don't like this because it adds an > unnecessary level to the tree. > > > > +1 for top level separation of code in different languages. > > I don't like structures like src/main/java/ & src/main/cpp/. If there > is code in two languages, then that differentiation should be made at a > higher level, such as java/src/ & cpp/src/ (plus java/test/ & cpp/test/, > etc.). That way if I work only on Java code I can still check out a src/ > directory, and don't need to check out something silly like java/ without > an src/ because the src/ is a level or two up and includes source in other > languages that I don't want. If I do check out all of src/, and even if > the cpp/ branch never changes, my sandbox is still muddied up with extra > files that I don't need. The cpp/ (or whatever) should be a separately > built resource that I don't need to build myself but can check out on a > daily basis. > > > > +1 to separate roots for each major (sub)project under one trunk. > > This goes somewhat hand-in-hand with single vs. multiple jars, so maybe > I'm being redundant. I don't think that there is any controversy, but I > want to put it here for posterity and just in case anybody has a better > idea. Currently we've got major projects within cTakes like core, the gui, > etc. It may be rare for any developer to work on more than one project at > a time (or ever), so they probably don't want to check out mixed code for > all projects - just code for their project and published jars for > dependencies. > > > > +1 to a single common package structure. > > I probably shouldn't need to say this, but our current code base has > this problem so I will. Different projects (with separated locations) > should have a common package structure. In other words, project A should > not have package structure org.apache.ctakes.A.annotation.* while package B > has org.apache.cTakes.B.frog.leg.annotation.* I would prefer that, > whichever structure is formulated first wins. If project A made its > structure first, then project B should endeavor to follow its lead with > something like org.apache.cTakes.B.annotation.frog.leg.* While this may > seem like it is completely unnecessary, it really does (imho) make keeping > things straight in my mind a lot easier when I work in/on multiple > projects. Plus, if there are dependencies it looks crazy when include > statements don't follow a single structure. It gets really bad (we've all > seen this) when a single project or code base has multiple packages with > the same name at different levels of a single tree. For instance, it makes > sense to have A.frog.leg.* and A.toad.leg.*, but A.frog.leg and > A.toad.appendage.leg is a strong candidate for refactoring, on one side or > the other. Even worse is A.frog.leg and A.frog.appendage.leg. If packages > in different projects or the same project have the same name but do not > have anything to do with each other then they probably should not have the > same name, regardless of what level they occupy in the tree. > > > > Ok, I think that is all that I've got for the 30,000 ft. structure. > > > > Cheers, > > Sean > > > > -----Original Message----- > > From: Mattmann, Chris A (388J) [mailto:[email protected]] > > Sent: Monday, July 16, 2012 7:30 PM > > To: <[email protected]> > > Subject: Re: SVN source structure for Apache cTAKES? > > > > +1 to having a shared trunk. In Apache OODT, we tried to separate them > > (and prior to bringing the software to Apache did so at JPL), however we > found that folks want a fully compatible Apache release, including > compatible versions of the sub components. See OODT-15 [1] for our > discussion and decision to keep it as 1 trunk. > > > > Cheers, > > Chris > > > > [1] https://issues.apache.org/jira/browse/OODT-15 > > > > On Jul 16, 2012, at 3:29 PM, Chen, Pei wrote: > > > >> https://issues.apache.org/jira/browse/CTAKES-10?focusedCommentId=13415 > >> 605#comment-13415605 how should the new SVN structure look like for > >> Apache cTAKES? > >> > >> Currently in SF, it looks like: > >> {cTAKES-root} > >> /branches > >> /tags > >> /trunk > >> -/cTAKES > >> -/core > >> /src > >> /desc > >> -/chunker > >> -/coref-resolver > >> Etc.. > >> Which means that all of those projects are all children of trunk and > will share the same release cycle. > >> > >> One alternative option looks something like (each component could have > it's own trunk/jar file?): > >> {cTAKES-root} > >> -/ctakes-core > >> /trunk > >> /src > >> /java > >> /main > >> /resources > >> /branches > >> /tags > >> -/ctakes-chunker > >> /trunk > >> /src > >> /java > >> /main > >> /resources > >> /branches > >> /tags > >> -/ctakes-coreference > >> /trunk > >> /src > >> /java > >> /main > >> /resources > >> /branches > >> /tags > >> > >> There are pro's and con's to both, but let's get the discussion started > as this will be required for the code migration. > >> > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Senior Computer Scientist > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 171-266B, Mailstop: 171-246 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Adjunct Assistant Professor, Computer Science Department University of > Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >
