Tim, this is extremely informative. Thank you Sean for updating the .xml's. I will play a little bit more with them and understand the process.
Until then, I can only speculate (please excuse my lack of understanding) that the *model.jar is produced out of these .txt(raw data)+.xml (annotations) files. For example, I presume, the anafora_annotated .xmls were produced using Anafora tool [1]. This is great step forward. Another step would be to understand the metadata relevant to these models. Just to give some examples from software delivery: groupId (owner), artifactId (product), version, classifier, packaging (e.g. .jar). I see other metadata associated: language, ontology, NLP techniques etc, that would allow comparison and measurement of these models, similar to how docker images are shared/distributed. I can only emphasize what Sean said at the beginning of this thread: "With a project like ctakes there are a lot of things that can be done, there are great opportunities (...)" [2] Alex [1] - https://www.semanticscholar.org/paper/Anafora-A-Web-based-General-Purpose-Annotation-Too-Chen-Styler/66ccd53060a018cadb804bcff266cfc202a4c5dd [2] - http://mail-archives.apache.org/mod_mbox/ctakes-dev/201711.mbox/%3Cc9144a6bfcd74c5fbd352791080ffdf1%40CHEXMAIL1A.CHBOSTON.ORG%3E On Tue, Nov 21, 2017 at 11:32 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > I just checked the files into trunk an hour ago, so you'll need to update. > > ctakes-examples-res /src/main/resources/ org/apache/ctakes/examples/ > annotation/anafora_annotated > > -----Original Message----- > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > Sent: Tuesday, November 21, 2017 11:20 AM > To: dev@ctakes.apache.org > Subject: Re: Contribute to ctakes: it is in your best interests! RE: > unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] > [SUSPICIOUS] [SUSPICIOUS] > > Yeah, it's definitely hard to do it the most efficient way because the > sensitive nature of our source data. You can see roughly what the source > data looks like in our ctakes-example-res project > (/home/tmill/Projects/ctakes-git/ctakes-examples- > res/src/main/resources/org/apache/ctakes/examples/annotation/anafora_an > notated) > Each document has a directory with the plaintext document and an xml file > indicating spans of entities and relations between entities. The xml files > contain no identified information, but the plaintext is required for > feature extraction, and so we cannot rebuild models without them. > > However, another possibility, as Alex mentioned, is to have models be not > in the git repo but be resources. We already intended something like that > by having them in *-res modules, but if there are other ideas for > structures that would keep models completely out of the repo (or in another > repo that wouldn't be required), I would be happy to hear about them. > > One final thing we (myself and others) need to be better at is that large > models shouldn't be checked in until they are used for default modules, and > shouldn't be used for default models unless they offer large performance > benefits (in terms of accuracy). Might be worth dev discussion if there is > some indecisio (for example, a 1Gb model that offers 2% improvement on > relation extraction, is that worth it?) Sometimes I've checked things in > that run in experimental projects where they may or may not make it into > default models. > > Tim > > > > > On Tue, 2017-11-21 at 14:21 +0000, Finan, Sean wrote: > > Hi Alex, > > > > > > > > I know about the importance of these models. > > My apologies if I offended. > > > > > > > > I would like to know if there is a way also to generate them. > > There is a little bit of documentation on models expertly written by > > Tim. Right now it is in a pamphlet that we distributed at a hackathon > > a couple of years ago and the contents should definitely be copied > > into the wiki. I think that there is a jira for it, but I'm not > > certain. > > On the main ctakes wiki page for 4.0 > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org > > _confluence_display_CTAKES_cTAKES- > > 2B4.0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > > IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=6V- > > pSvmqqANZgc5S56uDn3iKdm_e9XeiPBzEl4jTr5Q&s=PajX2LAbUuShItvLgZPSFtEdy8 > > I1--L-ok4nTjXNphk&e= > > it is on the second line in the "Documentation" list. > > Again, it needs to be moved into the wiki - and updated if necessary. > > > > > > > > The same principle (I presume) it applies. > > You need a bit of machine learning awareness and annotated data. > > > > > > > > If we are able to generate them, then we can version the source and > > > the process to generate them and not the binaries themselves. > > Some of the models are created using 'proprietary' data that cannot be > > distributed. > > Some of the models are created with data that is actually larger in > > footprint than the models. > > > > > > > > What is the lifecycle of a model? > > It depends what you mean by lifecycle. In terms of sdlc it is a very > > long waterfall. First, the aims are set. This often (around us, > > anyway) involves brainstorming between a number of people on aims for > > the model, like what types and attributes can and should be produced. > > An appropriate source for data needs to be found, the data acquired > > ... and getting a grant to cover the cost of doing it. Then the data > > needs to be annotated, then experts fiddle with the various features > > and methods for a while running a gazillion times to fine- tune. For > > example, I think that the temporal models have been under development > > for over five years by several developers, and the training data was > > annotated by another half dozen or so experts. If new data is > > acquired from another project the model is improved and updated. > > If you are asking about the lifetime of a model, that is highly > > variable. New data, new researchers, available time, interest and of > > course the accuracy of an existing model all play a part. A model may > > go years without any changes, or it might be updated monthly or weekly > > or even daily depending upon how a person is working and using vcs. > > > > > > > > Can it be integrated with other Deep Learning frameworks from ASF? > > Are you asking about other frameworks using ctakes models or ctakes > > using other models? I think that some of the models used by ctakes do > > originally come from other sources. Besides that, if those other > > frameworks are willing to use libraries like cleartk then there > > shouldn't be much of a problem. There are currently some initiatives > > trying to incorporate some deep learning frameworks. If anybody out > > there working on one is reading this then they can give you some > > information. > > > > > > > > I also come from a background of Continuous Delivery, > > I appreciate that in every sense of the word! > > > > I hope that this information helps. The pamphlet section on models > > that Tim wrote is the best starting point. ML experts (which I am > > not) out there can contribute a lot more information, probably even a > > correction or two. > > > > Sean > > > > -----Original Message----- > > From: Alexandru Zbarcea [mailto:zbarce...@gmail.com] > > Sent: Tuesday, November 21, 2017 8:35 AM > > To: Apache cTAKES Dev > > Subject: Re: Contribute to ctakes: it is in your best interests! RE: > > unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] > > > > Hi Sean, > > > > I know about the importance of these models. Tim was also kind enough > > to explain to me in a previous email on the mailing list about the > > importance of them and about the fact that these models were created > > by experts. > > > > However, I'm not proposing to remove them, but to document better > > their importance. Also, I would like to know if there is a way also to > > generate them. I appreciate the way Pipeline aggregation was solved in > > cTAKES, by creating a new DSL [1] (Piper) that was easy to read and > > also build a lot of automation and flexibility. The same principle (I > > presume) it applies. > > If we are able to generate them, then we can version the source and > > the process to generate them and not the binaries themselves. > > > > If we can use the cTAKES CLIs to generate some of these models, and > > simulate what the expert would do using the UI, we would have a > > reproducible process that can also be perfected over time by other > > experts. > > Is like the Lucene viewer vs Lucene Java API. I don't know how > > feasible this is, though. Just my $0.0.2. > > > > I'm looking to not only understand the cTAKES Java code, but how the > > entire process works. One of the pieces missing for me, is what > > expertise you actually need and how dependent of a context it is to > > build these models. I also come from a background of Continuous > > Delivery, so few questions popped > > out: What is the lifecycle of a model? Can it be integrated with other > > Deep Learning frameworks from ASF? > > > > What do you think? > > > > Alex > > > > [1] - https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikiped > > ia.org_wiki_Domain-2Dspecific- > > 5Flanguage&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=f > > s67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Z1PqE3gYYReZ9DTKn8orPn03 > > 5tOYJSebS_S_Yq39mHY&s=k5C2cLaa5HI6YU7YX0nXqzUWbrV_KHNqDzSWGyN_jqc&e= > > > > On Tue, Nov 21, 2017 at 7:30 AM, Finan, Sean < Sean.Finan@childrens.h > > arvard.edu> wrote: > > > > > > > > Hi Alex, > > > > > > The model.jar files are needed and cannot be removed. You may have > > > noticed that a lot of those hard-coded paths point to these > > > model.jar files. > > > > > > Sean > > > > > > > > > -----Original Message----- > > > From: Alexandru Zbarcea [mailto:al...@apache.org] > > > Sent: Monday, November 20, 2017 7:33 PM > > > To: Apache cTAKES Dev > > > Subject: Re: Contribute to ctakes: it is in your best interests! > > > RE: > > > unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > > [SUSPICIOUS] > > > > > > Thank Tim, > > > > > > I am in favor of moving to git too. If there is a desire from the > > > community to move entirely over git, > > > > > > I can work with Apache Infra to make the migration. > > > > > > I wonder if we can reduce the repository size on this transition. > > > Based on Apache rules, history is not allowed to be rewritten. > > > Migrations like these are used though, to cleanup some of the big > > > (space consuming) resource. > > > (e.g. models "*.jar"): > > > $ find . -name "*.jar" | xargs du -hsc 2.3M > > > ./ctakes-temporal-res/src/main/resources/org/apache/ > > > ctakes/temporal/ae/eventevent/model.jar > > > 348K ./ctakes-temporal-res/src/main/resources/org/apache/ > > > ctakes/temporal/ae/contextualmodality/model.jar > > > 4.0K ./ctakes-temporal-res/src/main/resources/org/apache/ > > > ctakes/temporal/ae/salience/model.jar > > > 1.0M ./ctakes-temporal-res/src/main/resources/org/apache/ > > > ctakes/temporal/ae/eventannotator/model.jar > > > 568K ./ctakes-temporal-res/src/main/resources/org/apache/ > > > ctakes/temporal/ae/doctimerel/model.jar > > > 2.2M ./ctakes-temporal-res/src/main/resources/org/apache/ > > > ctakes/temporal/ae/eventtime/model.jar > > > 1.3M ./ctakes-temporal-res/src/main/resources/org/apache/ > > > ctakes/temporal/ae/timeannotator/model.jar > > > 7.8M ./ctakes-pos-tagger-res/src/main/resources/org/apache/ > > > ctakes/postagger/models/clearnlp/mayo-en-pos-1.3.0.jar > > > 4.0K ./ctakes-coreference-res/src/main/resources/org/apache/ > > > ctakes/coreference/models/mention-cluster/model.jar > > > 1.5M ./ctakes-core-res/src/main/resources/org/apache/ctakes/ > > > core/sentdetect/model.jar > > > > > > 504K ./ctakes-assertion-res/src/main/resources/org/apache/ > > > ctakes/assertion/models/subject/model.jar > > > 588K ./ctakes-assertion-res/src/main/resources/org/apache/ > > > ctakes/assertion/models/historyOf/model.jar > > > 332K ./ctakes-assertion-res/src/main/resources/org/apache/ > > > ctakes/assertion/models/uncertainty/model.jar > > > 740K ./ctakes-assertion-res/src/main/resources/org/apache/ > > > ctakes/assertion/models/conditional/model.jar > > > 592K ./ctakes-assertion-res/src/main/resources/org/apache/ > > > ctakes/assertion/models/polarity/sharpi2b2mipacqnegex/model.jar > > > 572K ./ctakes-assertion-res/src/main/resources/org/apache/ > > > ctakes/assertion/models/generic/model.jar > > > 1.5M ./ctakes-assertion-res/resources/model/ > > > sharpi2b2mipacqnegex/polarity/model.jar > > > 312K ./ctakes-dependency-parser-res/src/main/resources/org/ > > > apache/ctakes/dependency/parser/models/lemmatizer/dictionary- > > > 1.3.1.jar > > > 228M ./ctakes-dependency-parser-res/src/main/resources/org/ > > > apache/ctakes/dependency/parser/models/clearparser_models.jar > > > 5.8M ./ctakes-dependency-parser-res/src/main/resources/org/ > > > apache/ctakes/dependency/parser/models/srl/mayo-en-srl-1.3.0.jar > > > 452K ./ctakes-dependency-parser-res/src/main/resources/org/ > > > apache/ctakes/dependency/parser/models/pred/mayo-en-pred-1.3.0.jar > > > 1.2M ./ctakes-dependency-parser-res/src/main/resources/org/ > > > apache/ctakes/dependency/parser/models/role/mayo-en-role-1.3.0.jar > > > 25M ./ctakes-dependency-parser-res/src/main/resources/ > > > org/apache/ctakes/dependency/parser/models/dependency/mayo- > > > en-dep-1.3.0.jar > > > 688K ./ctakes-relation-extractor-res/src/main/ > > > resources/org/apache/ctakes/relationextractor/models/location_of/mo > > > del.jar > > > 488K ./ctakes-relation-extractor-res/src/main/ > > > resources/org/apache/ctakes/relationextractor/models/degree_of/mode > > > l.jar > > > 300K ./ctakes-relation-extractor-res/src/main/ > > > resources/org/apache/ctakes/relationextractor/models/ > > > modifier_extractor/model.jar > > > > > > 282M total > > > > > > or > > > > > > $ find ./ -type f -size +5M | grep -v "\.jar" | grep -v "\.svn" | > > > grep -v "\.git" | xargs du -hsc 9.2M > > > ./ctakes-coreference-res/src/main/resources/org/apache/ > > > ctakes/coreference/models/index_med_5k/_3.prx > > > > > > 20M > > > ./ctakes-coreference-res/src/main/resources/org/apache/ > > > ctakes/coreference/models/index_med_5k/_3.tvf > > > > > > 6.9M > > > ./ctakes-coreference-res/src/main/resources/org/apache/ > > > ctakes/coreference/pref_probs.txt > > > > > > 13M > > > ./ctakes-chunker-res/src/main/resources/org/apache/ctakes/ > > > chunker/models/chunker-model.zip > > > > > > 6.4M > > > ./ctakes-constituency-parser-res/src/main/resources/org/ > > > apache/ctakes/constituency/parser/models/thyme.bin > > > > > > 15M > > > ./ctakes-constituency-parser-res/src/main/resources/org/ > > > apache/ctakes/constituency/parser/models/sharpacq-3.1.bin > > > > > > 12M > > > ./ctakes-constituency-parser-res/src/main/resources/org/ > > > apache/ctakes/constituency/parser/models/sharpacq-1.5.bin > > > > > > 84M > > > ./resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_ > > > 16ab/sno_rx_16ab.script > > > > > > 11M > > > ./ctakes-assertion-res/src/main/resources/org/apache/ > > > ctakes/assertion/models/pos.model > > > > > > 38M > > > > > > ./ctakes-assertion- > > > res/resources/model/sharpi2b2mipacqnegex/polarity/ > > > training-data.liblinear > > > > > > 9.6M > > > ./ctakes-temporal/src/main/resources/org/apache/ctakes/ > > > temporal/thyme_word2vec_mapped_50.vec > > > > > > 91M > > > ./ctakes-temporal/src/main/resources/org/apache/ctakes/ > > > temporal/gloveresult_3 > > > > > > 67M > > > ./ctakes-temporal/src/main/resources/org/apache/ctakes/ > > > temporal/mimic_vectors.txt > > > > > > 378M total > > > > > > Are all these resources still relevant? Is there a way to generate > > > them? > > > > > > I do not wish to open the Pandora box though, Alex > > > > > > > > > On Mon, Nov 20, 2017 at 9:29 AM, Finan, Sean <Sean.Finan@childrens. > > > harvard. > > > edu> wrote: > > > > > > > > > > > Thanks Tim! > > > > > > > > -----Original Message----- > > > > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.ed > > > > u] > > > > Sent: Monday, November 20, 2017 6:33 AM > > > > To: dev@ctakes.apache.org > > > > Subject: Re: Contribute to ctakes: it is in your best interests! > > > > RE: > > > > unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > > > [SUSPICIOUS] > > > > > > > > Git is available to apache projects, and many projects have moved > > > > over (see here: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git-2Dw > > > > ip-2Dus.apache.org_repos_asf&d=DwIFAw&c=qS4goWBT7poplM69zy_ > > > > 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKG > > > > d4f7d4gTao&m=4MlIq9wS4oGckpd3UeTqtmRuisKsRIYt9x2E8_IDYuU&s=X > > > > doxI3lfNrIjSbIVrftDXbkKSJCPH4UkwRroutX-Xp8&e=): > > > > Here is the general info on what that looks like: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apa > > > > che.org_dev_writable-2Dgit&d=DwIFAw&c=qS4goWBT7poplM69zy_3x > > > > hKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4 > > > > f7d4gTao&m=4MlIq9wS4oGckpd3UeTqtmRuisKsRIYt9x2E8_IDYuU&s=n- > > > > m8yd0ayquMf_zuubKtRyr7LydiMTj-tluvryaf0oA&e= > > > > > > > > A few points from that link: > > > > > > > > > > Projects can request moving to Git as their main code > > > > > repository, by > > > > creating an INFRA issue. See also the infra-contact page. > > > > > Projects can request new, blank repositories by using > > > > reporeq.apache.org. > > > > > > > > > > The current system has basic git support only. We are working on > > > > extending this service in the near future. > > > > > > > > > > Custom commit or other hooks will not be supported, all projects > > > > > get the > > > > same hooks. Setting up gitpubsub should provide sufficient > > > > flexiblity without impacting the core Git setup, volunteers are > > > > welcome to make that happen. > > > > > > > > (Not sure what basic support only means.) > > > > > > > > There are also read-only git repos available by default for every > > > > project and updated in near-real-time: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apa > > > > che.org_dev_git.html&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14 > > > > JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa > > > > o&m=4MlIq9wS4oGckpd3UeTqtmRuisKsRIYt9x2E8_IDYuU&s=C8RL68JNrL > > > > pGNVGdwP4YjKi3MZyMFevtQHOJxn7yWsc&e= > > > > > > > > with those I guess the suggested workflow is to work off of that > > > > repo and then just submit patches to someone who commits with svn > > > > rather than committing directly. > > > > > > > > I've been using the git-svn connector myself recently since I just > > > > vastly prefer the git lightweight branching for focused > > > > development, as it helps me keep a cleaner working directory. But > > > > that adds some additional annoying steps. > > > > > > > > Tim > > > > > > > > ________________________________________ > > > > From: Finan, Sean <sean.fi...@childrens.harvard.edu> > > > > Sent: Saturday, November 18, 2017 1:23 PM > > > > To: dev@ctakes.apache.org > > > > Subject: RE: Contribute to ctakes: it is in your best interests! > > > > RE: > > > > unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > > > > > > > Hi Dave, > > > > > > > > Those are some great thoughts. Being an apache project I am not > > > > sure how far we can move from svn, but there may be a way. You > > > > are not the first to voice this desire for an active github repo > > > > and I'm sure that you won't be the last. > > > > > > > > I completely agree with your discussion board preference. Do you > > > > have any recommendations? > > > > > > > > You make a great point regarding documentation. In reference to > > > > things that anybody can quickly contribute ... that would be a big > > > > one. > > > > Volunteers?!? > > > > > > > > I am really happy to hear that you want to contribute - more than > > > > you already have, which is actually quite a bit! > > > > > > > > Cheers, > > > > Sean > > > > > > > > -----Original Message----- > > > > From: David Kincaid [mailto:kincaid.d...@gmail.com] > > > > Sent: Saturday, November 18, 2017 1:10 PM > > > > To: dev@ctakes.apache.org > > > > Subject: Re: Contribute to ctakes: it is in your best interests! > > > > RE: > > > > unknown dependencies [EXTERNAL] [SUSPICIOUS] > > > > > > > > Sean, I can share a couple things that have been an obstacle for > > > > me. > > > > It may seem a minor point to some, but I left Subversion behind > > > > years ago and really have no desire to go back. If the project > > > > were moved over to Git/Github it would really smooth the way for > > > > me at least. I would be happy to help out with this. One of the > > > > other things I would really like to see is the mailing list moved > > > > onto a discussion board platform. It seems to me that a discussion > > > > board style of tool tends to create a more active community than a > > > > mailing list does. > > > > > > > > The other thing that might help get new people involved is making > > > > it easier to find information about the development environment. > > > > Things > > > > like branching strategies, coding conventions, etc are really hard > > > > to find from the main cTAKES web site. I saw some references to > > > > Jenkins builds recently on the list. I had no idea there was a > > > > Jenkins CI server for the project somewhere. It also takes some > > > > digging to find a link to Jira. Maybe we could create a Wiki page > > > > that describes where all these tools are and how they are used. > > > > > > > > You guys have really done some great work over the last couple of > > > > years cleaning up the code base and improving the documentation by > > > > a ton. Things like the fast dictionary annotator, dictionary > > > > creator GUI are a great addition and make it a lot easier for > > > > other people to get up and running more quickly. As I'm ramping up > > > > my research as well as some proof of concept stuff at work I'll be > > > > working more and more with cTAKES and would love to contribute > > > > more to the project. > > > > > > > > Just my thoughts. > > > > > > > > - Dave > > > > > > > > > > > > On Sat, Nov 18, 2017 at 11:10 AM, Finan, Sean < > > > > sean.fi...@childrens.harvard.edu> wrote: > > > > > > > > > > > > > > Hi Tim, Alex, > > > > > > > > > > Great ideas. I like your (Tim) idea to 1. start with commented > > > > > code removal. > > > > > Then maybe move on to > > > > > 2. sanity-test type unit tests - Little two or three-line "does > > > > > this method crack" tests. > > > > > And another that is simply > > > > > 3. "populate a test cas with type(s) X" and a factory with > > > > > "getSectionTestCas" "getSetenceTestCas" "getPosTestCas" > > > "getChunkTestCas" > > > > > > > > > > > > > > ... just really simple reusables for tests. > > > > > Then > > > > > 4. refactor to extract and consolidate duplicate code - it is > > > > > all over the place ... > > > > > > > > > > These are just my initial thoughts and suggestions, but I think > > > > > that > > > > those > > > > > > > > > > 4 tasks can be performed by anybody of any experience level. > > > > > They > > > build > > > > > > > > > > > > > > upon each other and should help the implementers better > > > > > understand > > > > ctakes. > > > > > > > > > > After that the sky is the limit. > > > > > > > > > > A couple of years ago I sat on a panel at a workshop for open > > > > > source scientific software. For the half dozen or so > > > > > highlighted projects (ctakes was one!) the common thread was > > > > > that getting people to contribute is extremely difficult. > > > > > I have a tendency to assume that people always act in their best > > > > > interests. Any student thinking of going towards industry > > > > > should be jumping at the opportunity to contribution to a large, > > > > > production-quality project. They should also realize that > > > > > contribution means potential recommendation (and possibly hiring > > > > > interest) by established developers, physicians and researchers > > > > > that use ctakes. Even just answering questions on a user or dev > > > > > list creates > > > > credibility and can build a network. > > > > > > > > > > Active researchers could discover common thoughts and directions > > > > > that could lead to collaboration outside ctakes. Researchers > > > > > and companies trying to build upon open source should realize > > > > > that direct contribution is easier than custom substitution. > > > > > Plus, it is in their best interests that code does what they > > > > > need it to do in the fastest, lightest, most stable way > > > > > possible. > > > > > With a project like ctakes there are a lot of things that can be > > > > > done, there are great opportunities to really shine. "I wrote > > > > > this tool for my thesis that performs some nlp task" sounds > > > > > good. > > > > > Appending "in an Apache product and it has been taken up by > > > > > thousands > > > across the globe" > > > > > > > > > > > > > > makes it sound a lot better. > > > > > At my previous job in industry the company actively contributed > > > > > to several open source projects. We had a few people for whom > > > > > that was 50% of their job. Why? Because we made a commitment > > > > > to use that open > > > > source software. > > > > > > > > > > It was a better use of our resources to contribute to it, > > > > > improve it and keep its momentum going and prevent it from > > > > > becoming stale (or > > > > > abandoned) while our software continued to move forward. > > > > > > > > > > Hmm, that was a touch more than I had planned to write. A whole > > > > > cup of coffee in that one. > > > > > > > > > > Sean > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Miller, Timothy > > > > > [mailto:timothy.mil...@childrens.harvard.edu] > > > > > Sent: Saturday, November 18, 2017 8:13 AM > > > > > To: dev@ctakes.apache.org > > > > > Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS] > > > > > > > > > > Thanks Alex, looks like that was probably a fat-fingered > > > > > auto-import on my part. > > > > > > > > > > I like your idea, and I don't know the best way to to start > > > > > either, but maybe one suggestion is to start with one or two > > > > > focused things to clean up, and then ask for volunteers to take > > > > > on specific modules? > > > > > Then people can contribute an hour here and there to do cleanup > > > > > on their task/module and try to fix that thing in a 1-2-month > > > > > long sprint. I am happy to contribute to cleanup, I am > > > > > responsible for my fair share of unclean code, but since I don't > > > > > have strong software engineering chops it would be good to have > > > > > people with that background propose the tasks and describe > > > > > exactly what needs to be done. My idea of cleaning is just to > > > > > delete commented out sections of > > > evaluation code. > > > > > > > > > > > > > > > > > > > Tim > > > > > > > > > > ________________________________________ > > > > > From: Alexandru Zbarcea <al...@apache.org> > > > > > Sent: Friday, November 17, 2017 4:46 PM > > > > > To: Apache cTAKES Dev > > > > > Subject: unknown dependencies [EXTERNAL] > > > > > > > > > > Hi, > > > > > > > > > > I notice that a miss-dependency has slipped in the code: > > > > > jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter; > > > > > > > > > > Now, that the Jenkins builds is successful, I think it is easier > > > > > to clean-up the code. I would like to be a common effort. I > > > > > don't know the best way to approach this. > > > > > > > > > > Looking forward to your advice, > > > > > Alex > > > > > >