Hi Vj, Sorry if I misunderstood you before on how do we ship the non-compatible jars/libs- Agreed, they'll probably have to reside somewhere else- maybe in its existing locations and just pulled together by the installer/ant script(s) as optional libs as you suggested? For the umls derived works, we can mimic the current umls bundled dictionaries. They get downloaded via maven central and/or sourceforge as ctakes-resources.zip as a separate download. Open to ideas though... --Pei
> -----Original Message----- > From: vijay garla [mailto:[email protected]] > Sent: Monday, October 21, 2013 7:03 PM > To: [email protected] > Subject: Re: move ytex annotators to ctakes.apache.org? > > More questions: > > > On Mon, Oct 21, 2013 at 4:26 PM, Pei Chen <[email protected]> wrote: > > > Hi Vijay, > > This is awesome. Some ideas inline below: > > > > >I'm not sure how you collect all the dependencies for shipment, but > > >how do > > I tell maven not to include these? > > Take a look at the distribution project [1]. It defines what gets put > > in and out of the distro. > > > > > Is it OK to check weka & jdbc into source control? > > Please do not commit the non-compatible license jars. We will have to > > remove thembefore it gets distributed anyway so best to avoid it. > > However, if you would like to include it in the Jira as an > > attachment/Sandbox initially to leverage the community's help, I can > > also take a look at it and lend a helping hand if needed- and perhaps > > others in the community may also be interested in helping out. > > > > These libraries can be included via maven (not checked in) and excluded > when creating the distro - that get's around having non-compatible jars in > source control/distro. The main question is, how to ship these? As part of > the resources jar? > > > > > > > * desc vs <project>-res > > The -res projects was originally designed for the models/resources. > > So that downstream consumers do not necessary have to include huge > > resource files if they only need the code. So, I would suggest any > > plain text config source files go directly into the project and it's > > corresponding -res project. > > [1] > > > > https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/ > > main/assembly/ > > > > > * distribution of umls concept graphs > > Are the contents of those concept graphs ASL 2.0 compatible? Probably > > will need to double check to see if it's modified/considered derived works? > > > > These should be handled identical to the UMLS hsql dictionary shipped in > resources - the concept graphs are derived from UMLS level 0 sources + > SNOMED CT > > > > > > >* patches to other ctakes projects > > I think these would be really good! perhaps we can even open Jira's > > and commit those in parallel.. > > > > >* post download setup > > I think this is actually a good idea- to have some kind of "installer" > > that guides the user through all the different download processes. > > Would it be possible to clone that to see how it would look like for > > ctakes? I was originally thinking of groovy or some other scripts, > > but would be curious to see especially if ytex already did something like > that. > > > > I am just using plain vanilla ant, but am open to doing this 'the' ctakes way, > but it doesn't sound like there is an established mechanism. I am open to > using any scripting language that is already included in ctakes. > > > > > > >* +1 for the ytext projects for the time-being. We can also refactor > > >them > > into the existing projects as appropriate (once everyone has a better > > understanding of the functionality?) > > > > Also, just curious- how big of a code base was this originally? I'm > > just thinking about IP Clearance here (if it's required). > > > > ~250 java source files, ~1mb source (didn't do a line count) the original > code > is ASF 2.0 license. > > > > > > > > On Mon, Oct 21, 2013 at 8:57 AM, vijay garla <[email protected]> wrote: > > > > > Hello All, > > > > > > I've started on the ytex-ctakes port, and have some packaging questions. > > > > > > * Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies: > > > I understand that we will not ship these jars as part of the ctakes > > > download. Can we bundle the jars and ship them as part of an > > > additional download, available via sourceforge? Hibernate is > > > available via maven central, weka and jdbc not. I have added weka & > > > jdbc drivers as system dependencies. I'm not sure how you collect > > > all the dependencies for shipment, but how do I tell maven not to > > > include these? Is it OK to > > check > > > weka & jdbc into source control? > > > > > > * desc vs <project>-res > > > What are the guidelines for what goes where? Configuration files > > > are > > found > > > in both places, whereas data/models are in the -res directory. Ytex > > > has many non-uima config files (hibernate, spring) which should be > > > user-modifiable, and I would put them in the desc directory. > > > However, > > desc > > > is not in the project classpath (but it is in the classpath for the > > ctakes > > > distro, e.g. in runctakesCPE.bat). Any reason for this dissonance? > > > I would add desc as a resources directory in the pom. > > > > > > * distribution of umls concept graphs for semantic similarity and > > > word sense disambiguation, ytex provides concept graphs derived from > > > the UMLS. We have a download site that requires UTS login to get > > > these concept graphs ( > > > http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip). I take > > > it I would just create a -res directory and add the concept graphs > > > here, and they would automagically appear in the ctakes-resources zip? > > > > > > * patches to other ctakes projects > > > ytex has some patches to other ctakes annotators for handling edge > > > cases where they throw up with an exception; I will check to see if > > > these > > changes > > > have already been made. If not, I will file separate Jira tickets > > > for these patches. Also, the > > > CharacterOffsetToLineTokenConverterCtakesImpl > > > needs to be modified to properly handle cases where newlines are in > > > sentences; I will add a patch for that as well. > > > > > > * post download setup > > > ytex provides an ant script to simplify the post download setup > > > (database schema, setup, configuration file generation). Would it > > > be possible to ship ant with the ctakes distro, so that users can execute > these scripts? > > > If not, how best to automate setup? I know from experience with > > > earlier versions of ytex that setting up the database schema is > > > error prone, and that this needs to be automated. > > > > > > > > > I was planning on creating the following projects: > > > * ctakes-ytex: > > > Base ytex, includes semantic similarity tools. This has no > > > dependencies > > on > > > ctakes, and I would create a separate distribution of just this > > > package > > for > > > a semantic similarity distro. > > > * ctakes-ytex-res > > > Includes concept graphs for semantic similarity. > > > * ctakes-ytex-web > > > Provides User Interface, RESTful, and WebServices interface to > > > semantic similarity service. This has no dependencies on ctakes, > > > and this would > > be > > > included in the semantic similarity distro. > > > * ctakes-ytex-uima > > > Includes ytex analysis engines > > > * ctakes-ytex-uima-res > > > resources for ytex analysis engines > > > > > > Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res > > > to existing projects (don't know where they would fit). > > > > > > Best, > > > > > > Vijay > > > > > > > > > > > > > > > On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <[email protected]> wrote: > > > > > > > Hi Pei, > > > > > > > > The WSD annotator relies on the semantic similarity component, > > > > which is a general purpose tool not strictly limited to ctakes or > > > > NLP. I would like to keep the semantic similarity component > > > > 'standalone', i.e. with no dependencies on ctakes, and make it > > > > redistributable on its own. If that is possible as part of ctakes, I'd > > > > love to > move it. > > > > If not, I'd leave the semantic similarity and the associated WSD > > > > annotator on google code. > > > > > > > > For those of you who want the back story: > > > > http://www.biomedcentral.com/1471-2105/13/261 > > > > http://jamia.bmj.com/content/20/5/882.long > > > > > > > > > > > > -vj > > > > > > > > On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei > > > > <[email protected]> wrote: > > > > > vj, > > > > > Were you thinking of contributing the new ytext Word Sense > > > > Disambiguation component as well- I think that will be really cool. > > > > > --Pei > > > > > > > > > >> -----Original Message----- > > > > >> From: [email protected] [mailto:[email protected]] On Behalf Of > > Karthik > > > > >> Sarma > > > > >> Sent: Thursday, October 03, 2013 1:05 PM > > > > >> To: [email protected] > > > > >> Subject: Re: move ytex annotators to ctakes.apache.org? > > > > >> > > > > >> This would be quite valuable -- in particular, ytex's > > > > >> annotation > > > > database > > > > >> connection is much easier to use than what ships with cTAKES. > > > > >> There > > > are > > > > a > > > > >> fair number of other advantages, and I think they'd all be very > > > > valuable! > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> Karthik Sarma > > > > >> UCLA Medical Scientist Training Program Class of 20?? > > > > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA > > > > >> Delegation to the House of Delegates of the American Medical > > > > >> Association [email protected] > > > > >> gchat: [email protected] > > > > >> linkedin: www.linkedin.com/in/ksarma > > > > >> > > > > >> > > > > >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <[email protected]> > > > wrote: > > > > >> > > > > >> > Hello All, > > > > >> > > > > > >> > I'd like to contribute ytex to ctakes. YTEX's main feature > > > > >> > is the ability to store *any* ctakes (or uima) annotation in > > > > >> > a relational database (in a relational format), and the > > > > >> > ability to export these annotations to ML packages (weka, > > > > >> > libsvm, matlab, R). All of this > > > is > > > > >> > purely declarative/via configuration. > > > > >> > > > > > >> > In addtion, Ytex provides the following: > > > > >> > * Negation Detection with Negex > > > > >> > * SegmentRegexAnnotator - section detection with regular > > expressions > > > > >> > * NamedEntityRegexAnnotator - named entity detection with > > > > >> > regular expressions > > > > >> > * Sentence Splitter - modified ctakes sentence splitter > > > > >> > making sentence split patterns configurable (not hardcoded to > > > > >> > \n) > > > > >> > > > > > >> > YTEX currently works with ctakes 2.5; I would like to upgrade > > > > >> > it > > to > > > > >> > the latest ctakes, and if the community is interested, > > > > >> > contribute > > to > > > > >> > ctakes.apache.org. > > > > >> > > > > > >> > A licensing question: YTEX uses Spring (apache 2.0 license), > > > Hibernate > > > > >> > (lgpl 2.1), & weka (gpl). Are there any issues with > > > > >> > including > > > these? > > > > >> > > > > > >> > Cheers > > > > >> > > > > > >> > vj > > > > >> > > > > > > > > > >
