+1 for separation. -----Original Message----- From: Chen, Pei [mailto:[email protected]] Sent: Tuesday, July 17, 2012 11:53 AM To: [email protected] Subject: RE: SVN source structure for Apache cTAKES?
One additional discussion item for the /resources folder: If there are reference data/models etc. that is not tied to a specific release, should we treat it as its own component in the top level? The size of these models/lookup dictionaries, etc. could reach TB's... --Pei -----Original Message----- From: Chen, Pei [mailto:[email protected]] Sent: Tuesday, July 17, 2012 11:30 AM To: [email protected] Subject: RE: SVN source structure for Apache cTAKES? I think this is starting to look like opennlp's src structure: https://svn.apache.org/repos/asf/opennlp/trunk/ If we do not need to keep separate release cycles, then we can most likely get away with a single trunk. In the future, if there are aux components that require their own release schedule, we can always manage separately in their own separate project and SVN... +1 on keeping separate jars similar to opennlp. +1 for standard package structure and naming conventions and also lowercasing all directory names and removing spaces :). -1 On src/main/java though- If we decide to use Maven as the build utility, then I would suggest keeping the extra level src/main/java. This is solely for the ease of integration/use of Maven. Maven's defaults and plugin's usually expect the tree to be in that structure. I think it actually becomes more work to customize everything to be outside of that structure. --Pei -----Original Message----- From: Finan, Sean [mailto:[email protected]] Sent: Tuesday, July 17, 2012 11:04 AM To: [email protected] Subject: RE: SVN source structure for Apache cTAKES? +1 for a single trunk. In my experience, even if the app is oriented around services and/or modules planned point releases of individual products in a single trunk does not pose a problem, as you can make a branch of the whole trunk, then let those products be developed on that branch where other product source etc. is static (or hopefully vice-versa). This was useful in one case where we had code for a database that evolved much more slowly than other dependent products. While it didn't much matter to developers, according to our CM keeping everything in one trunk made efforts easier on their side. I took them at their word. Please note that I am not saying that we should or will need to have separate product releases, just that I don't think a single trunk should prevent us from doing so. +1 for multiple jars. The matter of single jar vs. multiple jars is not necessarily connected to having a single or multiple trunks. I think that separate projects should have separate jar files. This way developers who focus on a single project just need to check out their project's source and jars for each dependency. Integration should build each project in a top-down fashion and if a certain project doesn't test-out or build properly then it doesn't get a (new) published jar. This keeps everybody dependent upon that project from being held up the next day with a broken build as they can check out the published jar without really worrying about whether it is truly new or not, it is a working version. It goes along with the notion of "always shippable", one of those agility things. +1 for separate top-level src/ test/ example/ and resource/ directories. This question was not explicitly mentioned in this topic, but it does have something to do with overall structure and jars (Pei does have src/ and resource/ in his post). I like the idea of having one root directory (under each project) for source, one for tests, and one for examples. All directories share the same package structure. I have a few reasons for doing this. The test/ directory keeps my src/ directory from getting cluttered with files that are tests and not source, which makes browsing (in and out of IDE) faster. For that matter, it makes for a smaller and simpler the source tree than having test/ subdirectories (which seems to be a common practice) all over the place. The example/ directory also keeps source directories from becoming cluttered, and for anybody new to the code base it can make finding decent examples for what they want easier and faster. In addition, it keeps the source code from having long main() methods (which also seems to be a common practice) and other methods that are necessary for examples but not the purpose of the class. Having examples in an example/ directory also makes it obvious to a new developer that they are examples and not old (non junit) tests (which, btw we need to extract). I also have a separate resource/ root directory (such as in the original post), which reduces clutter and makes browsing easier etc. Another thing that these separate root directories make possible is lighter jar files. One can build and test everything, but publish a jar with just the src/. That makes dependency updates faster for people that don't need the code. cTakes isn't that big, but it is something to keep in mind. A very minor point is that people should regularly be checking in tests (and a few examples). With all the code in one src/ root, it is difficult to notice whether or not somebody is being responsible in this regard. However, it is very simple to survey at a glance root directories with large checkins and see if anything is in test/. If there are a dozen new classes checked into src/ and nothing into test/ then the committer might need a friendly reminder to write tests for the new code. For that matter, if a project starts to look src/ heavy and test/ light (easy to see), then we can try to schedule a test-writing iteration. Once again I'll agree that writing tests can be a pain. However, it does make things easier in the long run, especially in projects with multiple developers who come and go. One last note on this is that sometimes there is a structure such as src/main/ & src/test/. I don't like this because it adds an unnecessary level to the tree. +1 for top level separation of code in different languages. I don't like structures like src/main/java/ & src/main/cpp/. If there is code in two languages, then that differentiation should be made at a higher level, such as java/src/ & cpp/src/ (plus java/test/ & cpp/test/, etc.). That way if I work only on Java code I can still check out a src/ directory, and don't need to check out something silly like java/ without an src/ because the src/ is a level or two up and includes source in other languages that I don't want. If I do check out all of src/, and even if the cpp/ branch never changes, my sandbox is still muddied up with extra files that I don't need. The cpp/ (or whatever) should be a separately built resource that I don't need to build myself but can check out on a daily basis. +1 to separate roots for each major (sub)project under one trunk. This goes somewhat hand-in-hand with single vs. multiple jars, so maybe I'm being redundant. I don't think that there is any controversy, but I want to put it here for posterity and just in case anybody has a better idea. Currently we've got major projects within cTakes like core, the gui, etc. It may be rare for any developer to work on more than one project at a time (or ever), so they probably don't want to check out mixed code for all projects - just code for their project and published jars for dependencies. +1 to a single common package structure. I probably shouldn't need to say this, but our current code base has this problem so I will. Different projects (with separated locations) should have a common package structure. In other words, project A should not have package structure org.apache.ctakes.A.annotation.* while package B has org.apache.cTakes.B.frog.leg.annotation.* I would prefer that, whichever structure is formulated first wins. If project A made its structure first, then project B should endeavor to follow its lead with something like org.apache.cTakes.B.annotation.frog.leg.* While this may seem like it is completely unnecessary, it really does (imho) make keeping things straight in my mind a lot easier when I work in/on multiple projects. Plus, if there are dependencies it looks crazy when include statements don't follow a single structure. It gets really bad (we've all seen this) when a single project or code base has multiple packages with the same name at different levels of a single tree. For instance, it makes sense to have A.frog.leg.* and A.toad.leg.*, but A.frog.leg and A.toad.appendage.leg is a strong candidate for refactoring, on one side or the other. Even worse is A.frog.leg and A.frog.appendage.leg. If packages in different projects or the same project have the same name but do not have anything to do with each other then they probably should not have the same name, regardless of what level they occupy in the tree. Ok, I think that is all that I've got for the 30,000 ft. structure. Cheers, Sean -----Original Message----- From: Mattmann, Chris A (388J) [mailto:[email protected]] Sent: Monday, July 16, 2012 7:30 PM To: <[email protected]> Subject: Re: SVN source structure for Apache cTAKES? +1 to having a shared trunk. In Apache OODT, we tried to separate them (and prior to bringing the software to Apache did so at JPL), however we found that folks want a fully compatible Apache release, including compatible versions of the sub components. See OODT-15 [1] for our discussion and decision to keep it as 1 trunk. Cheers, Chris [1] https://issues.apache.org/jira/browse/OODT-15 On Jul 16, 2012, at 3:29 PM, Chen, Pei wrote: > https://issues.apache.org/jira/browse/CTAKES-10?focusedCommentId=13415 > 605#comment-13415605 how should the new SVN structure look like for > Apache cTAKES? > > Currently in SF, it looks like: > {cTAKES-root} > /branches > /tags > /trunk > -/cTAKES > -/core > /src > /desc > -/chunker > -/coref-resolver > Etc.. > Which means that all of those projects are all children of trunk and will > share the same release cycle. > > One alternative option looks something like (each component could have it's > own trunk/jar file?): > {cTAKES-root} > -/ctakes-core > /trunk > /src > /java > /main > /resources > /branches > /tags > -/ctakes-chunker > /trunk > /src > /java > /main > /resources > /branches > /tags > -/ctakes-coreference > /trunk > /src > /java > /main > /resources > /branches > /tags > > There are pro's and con's to both, but let's get the discussion started as > this will be required for the code migration. > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
