Hi James, I can't answer conclusively, but I believe those are all models trained using the clearTK framework. There may be a way of packaging them as separate files rather than jars but I'm not sure if that would have any benefit since: 1) You almost always will not be/should not be modifying models trained using machine learning methods, 2) It may clutter up the models directory -- if you have a 10 class classifier it will build 10 one-vs-all classifiers and package them together (along with some metadata) and I think it is valuable to encapsulate this even for many of the core ctakes developers.
As far as whether these models change or evolve, they will certainly change as features or more data are added, but I wouldn't really say evolve, at least not in the sense that svn can take advantage of. A small change in the training input will change these models almost completely so it wouldn't really be valuable to, e.g., view diffs between component model files even if they are ascii. Dima or Steve Bethard may be able to answer better but I'm not sure if they are checking email today. Tim On Jan 21, 2013, at 2:56 PM, Masanz, James J. wrote: > > Tim and Dima, > > This question came up on [email protected] regarding models > included within the source release: > >> As for the models, I don' think there is any issue in keeping them in >> jars, but the question is why? Are they never going to evolve or change? >> Wouldn't you keep them as source and just package them as jars for use at >> runtime? As I said, this is not an issue, I am just curious. > > If you have any input I should include to a response to Matt F, let me know. > > Otherwise my response will probably be to open a JIRA issue, with this as > something to be changed in a future release (including building the jars at > build time) > > -- James > >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] >> On Behalf Of Matt Franklin >> Sent: Monday, January 21, 2013 12:47 PM >> To: [email protected] >> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release >> >> On Mon, Jan 21, 2013 at 12:16 PM, Masanz, James J. >> <[email protected]> wrote: >>> >>> Regarding the comment about compiled jars in the source tree: >>> >>> The following jars, even though they are under src directories, contain >> resources (models), not Java classes. >>> >>> conll-2009-dev-shift-pop.jar >>> dummy.dep.mod.jar >>> mayo-dep.jar >>> wordnet-3.0-lemma-data.jar >>> dummy.srl.mod.jar >>> en_srl_ontonotes.jar >>> mayo-srl.jar >>> clearparser_models.jar >>> >>> degree_of/model.jar >>> em_pair/model.jar >>> modifier_extractor/model.jar >>> >>> Are there other jars you were referring to? >> >> ./ctakes-assertion/lib/jcarafe-core_2.9.1-0.9.8.3.RC4.jar >> ./ctakes-assertion/lib/jcarafe-ext_2.9.1-0.9.8.3.RC4.jar >> ./ctakes-assertion/lib/med-facts-i2b2-1.2-SNAPSHOT.jar >> ./ctakes-assertion/lib/med-facts-zoner-1.1.jar >> ./ctakes-constituency-parser/lib/libsvm-2.91.jar >> ./ctakes-coreference/lib/commons-io-2.1.jar >> ./ctakes-coreference/lib/commons-lang3-3.0.1.jar >> ./ctakes-coreference/lib/Jama-1.0.2.jar >> ./ctakes-coreference/lib/libsvm-2.91.jar >> ./ctakes-dependency-parser/lib/args4j-2.0.16.jar >> ./ctakes-dependency-parser/lib/clearparser-0.33.jar >> ./ctakes-dependency-parser/lib/cleartk-util-0.8.1.jar >> ./ctakes-dependency-parser/lib/commons-io-2.0.1.jar >> ./ctakes-dependency-parser/lib/commons-lang-2.4.jar >> ./ctakes-dependency-parser/lib/commons-logging-1.1.1.jar >> ./ctakes-dependency-parser/lib/hppc-0.3.1.jar >> ./ctakes-dependency-parser/lib/uimafit-1.2.0.jar >> >> These are just a few that exist in SVN. All dependencies that are compiled >> code need to be externally referenced. >> >> As for the models, I don' think there is any issue in keeping them in >> jars, but the question is why? Are they never going to evolve or change? >> Wouldn't you keep them as source and just package them as jars for use at >> runtime? As I said, this is not an issue, I am just curious. >> >>> >>> I will look at the NOTICE file this afternoon. >>> >>> Regards, >>> James Masanz >>> >>> >>>> -----Original Message----- >>>> From: [email protected] >>>> [mailto:[email protected] >>>> rg] >>>> On Behalf Of Masanz, James J. >>>> Sent: Monday, January 21, 2013 9:51 AM >>>> To: '[email protected]' >>>> Subject: RE: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release >>>> >>>> The result of the VOTE for 3.0.0-incubating on the dev list is at >>>> http://mail-archives.apache.org/mod_mbox/incubator-ctakes- >>>> dev/201301.mbox/browser >>>> >>>> The source artifact can be found in >>>> http://people.apache.org/~chenpei/ctakes-3.0.0-incubating/rc5/target/ >>>> >>>> I'll look at the jars and the root NOTICE file. >>>> >>>> Thanks for your review! >>>> -- James Masanz >>>> >>>>> -----Original Message----- >>>>> From: >>>>> [email protected] >>>>> [mailto:[email protected] >>>>> .or >>>>> g] >>>>> On Behalf Of Matt Franklin >>>>> Sent: Monday, January 21, 2013 7:42 AM >>>>> To: [email protected] >>>>> Cc: [email protected] >>>>> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release >>>>> >>>>> I have some issues with this release as it currently stands: >>>>> >>>>> * Where is the result of the VOTE thread on the dev list? >>>>> * Where is the source artifact? The artifact linked in the vote >>>>> thread appears to be your convenience binary release. >>>>> * There are compiled jars in the source tree. These need to be >>>>> externalized in some fashion. >>>>> * There are LICENSE & NOTICE files in individual project >>>>> directories that contain entries that don't appear in the root >>>>> NOTICE file. If you intend on releasing the subcomponents >>>>> individually, this makes some sense; but I think that the entries >>>>> should be merged into the root NOTICE file >>>>> >>>>> >>>>> On Fri, Jan 18, 2013 at 9:39 AM, Coarr, Matt <[email protected]> >> wrote: >>>>>> Hi, we just need one more Incubator PMC vote for cTAKES version >> 3.0. >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>> --- >>>>>> From: <Chen>, Pei <[email protected]> >>>>>> Subject: Collecting IPMC votes >>>>>> >>>>>> Hi, >>>>>> >>>>>> This is a call for a vote on releasing the following candidate as >>>>>> Apache cTAKES 3.0.0-incubating. >>>>>> This will be our first release. >>>>>> >>>>>> >>>>>> >>>>>> A vote is also held on the developer mailing list: >>>>>> http://mail-archives.apache.org/mod_mbox/incubator-ctakes-dev/201 >>>>>> 301 >>>>>> .m >>>>>> box/b >>>>>> rowser >>>>>> >>>>>> >>>>>> >>>>>> For more detailed information on the changes/release notes, >>>>>> please >>>>> visit: >>>>>> >>>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId= >>>>>> 123 >>>>>> 13 >>>>>> 621&v >>>>>> ersion=12322969 >>>>>> >>>>>> >>>>>> >>>>>> The release was made using the cTAKES release process documented >> here: >>>>>> http://incubator.apache.org/ctakes/ctakes-release-guide.html >>>>>> >>>>>> The candidate is available at: >>>>>> >>>>>> http://people.apache.org/~chenpei/ctakes-3.0.0-incubating/rc5/tar >>>>>> get >>>>>> /a >>>>>> pache >>>>>> -ctakes-3.0.0-incubating-bin.tar.gz >>>>>> /.zip >>>>>> >>>>>> >>>>>> >>>>>> The tag to be voted on: >>>>>> >>>>>> http://svn.apache.org/repos/asf/incubator/ctakes/tags/ctakes-3.0. >>>>>> 0-i >>>>>> nc >>>>>> ubati >>>>>> ng-rc5/ >>>>>> >>>>>> >>>>>> >>>>>> The MD5 checksum of the tarball can be found at: >>>>>> >>>>>> http://people.apache.org/~chenpei/ctakes-3.0.0-incubating/rc5/tar >>>>>> get >>>>>> /a >>>>>> pache >>>>>> -ctakes-3.0.0-incubating-bin.tar.gz.md5 >>>>>> /.zip.md5 >>>>>> >>>>>> >>>>>> >>>>>> The signature of the tarball can be found at: >>>>>> >>>>>> http://people.apache.org/~chenpei/ctakes-3.0.0-incubating/rc5/tar >>>>>> get /a pache -ctakes-3.0.0-incubating-bin.tar.gz.asc >>>>>> /.zip.asc >>>>>> >>>>>> >>>>>> >>>>>> Apache cTAKES' KEYS file, containing the PGP keys used to sign >>>>>> the >>>>> release: >>>>>> >>>>>> http://svn.apache.org/repos/asf/incubator/ctakes/tags/ctakes-3.0. >>>>>> 0-i >>>>>> nc >>>>>> ubati >>>>>> ng-rc5/KEYS >>>>>> >>>>>> >>>>>> >>>>>> Please vote on releasing these packages as Apache cTAKES 3.0.0- >>>>> incubating. >>>>>> The vote is open >>>>>> for at least the next 72 hours. >>>>>> >>>>>> Only votes from Incubator PMC are binding, but folks are welcome >>>>>> to check the release candidate and voice their approval or >> disapproval. >>>>>> The vote passes if at least three binding +1 votes are cast. >>>>>> >>>>>> >>>>>> >>>>>> [ ] +1 Release the packages as Apache cTAKES 3.0.0-incubating [ ] >>>>>> -1 Do not release the packages because... >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Pei >>>>>> >>>>>> P.S. Here is my +1. >>>>>> >>>>> >>>>> ------------------------------------------------------------------- >>>>> -- To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >
