More questions:
On Mon, Oct 21, 2013 at 4:26 PM, Pei Chen <[email protected]> wrote: > Hi Vijay, > This is awesome. Some ideas inline below: > > >I'm not sure how you collect all the dependencies for shipment, but how do > I tell maven not to include these? > Take a look at the distribution project [1]. It defines what gets put in > and out of the distro. > > > Is it OK to check weka & jdbc into source control? > Please do not commit the non-compatible license jars. We will have to > remove thembefore it gets distributed anyway so best to avoid it. However, > if you would like to include it in the Jira as an attachment/Sandbox > initially to leverage the community's help, I can also take a look at it > and lend a helping hand if needed- and perhaps others in the community may > also be interested in helping out. > These libraries can be included via maven (not checked in) and excluded when creating the distro - that get's around having non-compatible jars in source control/distro. The main question is, how to ship these? As part of the resources jar? > > > * desc vs <project>-res > The -res projects was originally designed for the models/resources. So > that downstream consumers do not necessary have to include huge resource > files if they only need the code. So, I would suggest any plain text > config source files go directly into the project and it's corresponding > -res project. > [1] > > https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/main/assembly/ > > > * distribution of umls concept graphs > Are the contents of those concept graphs ASL 2.0 compatible? Probably will > need to double check to see if it's modified/considered derived works? > These should be handled identical to the UMLS hsql dictionary shipped in resources - the concept graphs are derived from UMLS level 0 sources + SNOMED CT > > >* patches to other ctakes projects > I think these would be really good! perhaps we can even open Jira's and > commit those in parallel.. > > >* post download setup > I think this is actually a good idea- to have some kind of "installer" that > guides the user through all the different download processes. Would it be > possible to clone that to see how it would look like for ctakes? I was > originally thinking of groovy or some other scripts, but would be curious > to see especially if ytex already did something like that. > I am just using plain vanilla ant, but am open to doing this 'the' ctakes way, but it doesn't sound like there is an established mechanism. I am open to using any scripting language that is already included in ctakes. > > >* +1 for the ytext projects for the time-being. We can also refactor them > into the existing projects as appropriate (once everyone has a better > understanding of the functionality?) > > Also, just curious- how big of a code base was this originally? I'm just > thinking about IP Clearance here (if it's required). > ~250 java source files, ~1mb source (didn't do a line count) the original code is ASF 2.0 license. > > > On Mon, Oct 21, 2013 at 8:57 AM, vijay garla <[email protected]> wrote: > > > Hello All, > > > > I've started on the ytex-ctakes port, and have some packaging questions. > > > > * Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies: > > I understand that we will not ship these jars as part of the ctakes > > download. Can we bundle the jars and ship them as part of an additional > > download, available via sourceforge? Hibernate is available via maven > > central, weka and jdbc not. I have added weka & jdbc drivers as system > > dependencies. I'm not sure how you collect all the dependencies for > > shipment, but how do I tell maven not to include these? Is it OK to > check > > weka & jdbc into source control? > > > > * desc vs <project>-res > > What are the guidelines for what goes where? Configuration files are > found > > in both places, whereas data/models are in the -res directory. Ytex has > > many non-uima config files (hibernate, spring) which should be > > user-modifiable, and I would put them in the desc directory. However, > desc > > is not in the project classpath (but it is in the classpath for the > ctakes > > distro, e.g. in runctakesCPE.bat). Any reason for this dissonance? I > > would add desc as a resources directory in the pom. > > > > * distribution of umls concept graphs > > for semantic similarity and word sense disambiguation, ytex provides > > concept graphs derived from the UMLS. We have a download site that > > requires UTS login to get these concept graphs ( > > http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip). I take it I > > would just create a -res directory and add the concept graphs here, and > > they would automagically appear in the ctakes-resources zip? > > > > * patches to other ctakes projects > > ytex has some patches to other ctakes annotators for handling edge cases > > where they throw up with an exception; I will check to see if these > changes > > have already been made. If not, I will file separate Jira tickets for > > these patches. Also, the CharacterOffsetToLineTokenConverterCtakesImpl > > needs to be modified to properly handle cases where newlines are in > > sentences; I will add a patch for that as well. > > > > * post download setup > > ytex provides an ant script to simplify the post download setup (database > > schema, setup, configuration file generation). Would it be possible to > > ship ant with the ctakes distro, so that users can execute these scripts? > > If not, how best to automate setup? I know from experience with earlier > > versions of ytex that setting up the database schema is error prone, and > > that this needs to be automated. > > > > > > I was planning on creating the following projects: > > * ctakes-ytex: > > Base ytex, includes semantic similarity tools. This has no dependencies > on > > ctakes, and I would create a separate distribution of just this package > for > > a semantic similarity distro. > > * ctakes-ytex-res > > Includes concept graphs for semantic similarity. > > * ctakes-ytex-web > > Provides User Interface, RESTful, and WebServices interface to semantic > > similarity service. This has no dependencies on ctakes, and this would > be > > included in the semantic similarity distro. > > * ctakes-ytex-uima > > Includes ytex analysis engines > > * ctakes-ytex-uima-res > > resources for ytex analysis engines > > > > Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to > > existing projects (don't know where they would fit). > > > > Best, > > > > Vijay > > > > > > > > > > On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <[email protected]> wrote: > > > > > Hi Pei, > > > > > > The WSD annotator relies on the semantic similarity component, which > > > is a general purpose tool not strictly limited to ctakes or NLP. I > > > would like to keep the semantic similarity component 'standalone', > > > i.e. with no dependencies on ctakes, and make it redistributable on > > > its own. If that is possible as part of ctakes, I'd love to move it. > > > If not, I'd leave the semantic similarity and the associated WSD > > > annotator on google code. > > > > > > For those of you who want the back story: > > > http://www.biomedcentral.com/1471-2105/13/261 > > > http://jamia.bmj.com/content/20/5/882.long > > > > > > > > > -vj > > > > > > On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei > > > <[email protected]> wrote: > > > > vj, > > > > Were you thinking of contributing the new ytext Word Sense > > > Disambiguation component as well- I think that will be really cool. > > > > --Pei > > > > > > > >> -----Original Message----- > > > >> From: [email protected] [mailto:[email protected]] On Behalf Of > Karthik > > > >> Sarma > > > >> Sent: Thursday, October 03, 2013 1:05 PM > > > >> To: [email protected] > > > >> Subject: Re: move ytex annotators to ctakes.apache.org? > > > >> > > > >> This would be quite valuable -- in particular, ytex's annotation > > > database > > > >> connection is much easier to use than what ships with cTAKES. There > > are > > > a > > > >> fair number of other advantages, and I think they'd all be very > > > valuable! > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> -- > > > >> Karthik Sarma > > > >> UCLA Medical Scientist Training Program Class of 20?? > > > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation > > > >> to the House of Delegates of the American Medical Association > > > >> [email protected] > > > >> gchat: [email protected] > > > >> linkedin: www.linkedin.com/in/ksarma > > > >> > > > >> > > > >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <[email protected]> > > wrote: > > > >> > > > >> > Hello All, > > > >> > > > > >> > I'd like to contribute ytex to ctakes. YTEX's main feature is the > > > >> > ability to store *any* ctakes (or uima) annotation in a relational > > > >> > database (in a relational format), and the ability to export these > > > >> > annotations to ML packages (weka, libsvm, matlab, R). All of this > > is > > > >> > purely declarative/via configuration. > > > >> > > > > >> > In addtion, Ytex provides the following: > > > >> > * Negation Detection with Negex > > > >> > * SegmentRegexAnnotator - section detection with regular > expressions > > > >> > * NamedEntityRegexAnnotator - named entity detection with regular > > > >> > expressions > > > >> > * Sentence Splitter - modified ctakes sentence splitter making > > > >> > sentence split patterns configurable (not hardcoded to \n) > > > >> > > > > >> > YTEX currently works with ctakes 2.5; I would like to upgrade it > to > > > >> > the latest ctakes, and if the community is interested, contribute > to > > > >> > ctakes.apache.org. > > > >> > > > > >> > A licensing question: YTEX uses Spring (apache 2.0 license), > > Hibernate > > > >> > (lgpl 2.1), & weka (gpl). Are there any issues with including > > these? > > > >> > > > > >> > Cheers > > > >> > > > > >> > vj > > > >> > > > > > > >
