Hi Vijay, This is awesome. Some ideas inline below: >I'm not sure how you collect all the dependencies for shipment, but how do I tell maven not to include these? Take a look at the distribution project [1]. It defines what gets put in and out of the distro.
> Is it OK to check weka & jdbc into source control? Please do not commit the non-compatible license jars. We will have to remove thembefore it gets distributed anyway so best to avoid it. However, if you would like to include it in the Jira as an attachment/Sandbox initially to leverage the community's help, I can also take a look at it and lend a helping hand if needed- and perhaps others in the community may also be interested in helping out. > * desc vs <project>-res The -res projects was originally designed for the models/resources. So that downstream consumers do not necessary have to include huge resource files if they only need the code. So, I would suggest any plain text config source files go directly into the project and it's corresponding -res project. [1] https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/main/assembly/ > * distribution of umls concept graphs Are the contents of those concept graphs ASL 2.0 compatible? Probably will need to double check to see if it's modified/considered derived works? >* patches to other ctakes projects I think these would be really good! perhaps we can even open Jira's and commit those in parallel.. >* post download setup I think this is actually a good idea- to have some kind of "installer" that guides the user through all the different download processes. Would it be possible to clone that to see how it would look like for ctakes? I was originally thinking of groovy or some other scripts, but would be curious to see especially if ytex already did something like that. >* +1 for the ytext projects for the time-being. We can also refactor them into the existing projects as appropriate (once everyone has a better understanding of the functionality?) Also, just curious- how big of a code base was this originally? I'm just thinking about IP Clearance here (if it's required). On Mon, Oct 21, 2013 at 8:57 AM, vijay garla <[email protected]> wrote: > Hello All, > > I've started on the ytex-ctakes port, and have some packaging questions. > > * Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies: > I understand that we will not ship these jars as part of the ctakes > download. Can we bundle the jars and ship them as part of an additional > download, available via sourceforge? Hibernate is available via maven > central, weka and jdbc not. I have added weka & jdbc drivers as system > dependencies. I'm not sure how you collect all the dependencies for > shipment, but how do I tell maven not to include these? Is it OK to check > weka & jdbc into source control? > > * desc vs <project>-res > What are the guidelines for what goes where? Configuration files are found > in both places, whereas data/models are in the -res directory. Ytex has > many non-uima config files (hibernate, spring) which should be > user-modifiable, and I would put them in the desc directory. However, desc > is not in the project classpath (but it is in the classpath for the ctakes > distro, e.g. in runctakesCPE.bat). Any reason for this dissonance? I > would add desc as a resources directory in the pom. > > * distribution of umls concept graphs > for semantic similarity and word sense disambiguation, ytex provides > concept graphs derived from the UMLS. We have a download site that > requires UTS login to get these concept graphs ( > http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip). I take it I > would just create a -res directory and add the concept graphs here, and > they would automagically appear in the ctakes-resources zip? > > * patches to other ctakes projects > ytex has some patches to other ctakes annotators for handling edge cases > where they throw up with an exception; I will check to see if these changes > have already been made. If not, I will file separate Jira tickets for > these patches. Also, the CharacterOffsetToLineTokenConverterCtakesImpl > needs to be modified to properly handle cases where newlines are in > sentences; I will add a patch for that as well. > > * post download setup > ytex provides an ant script to simplify the post download setup (database > schema, setup, configuration file generation). Would it be possible to > ship ant with the ctakes distro, so that users can execute these scripts? > If not, how best to automate setup? I know from experience with earlier > versions of ytex that setting up the database schema is error prone, and > that this needs to be automated. > > > I was planning on creating the following projects: > * ctakes-ytex: > Base ytex, includes semantic similarity tools. This has no dependencies on > ctakes, and I would create a separate distribution of just this package for > a semantic similarity distro. > * ctakes-ytex-res > Includes concept graphs for semantic similarity. > * ctakes-ytex-web > Provides User Interface, RESTful, and WebServices interface to semantic > similarity service. This has no dependencies on ctakes, and this would be > included in the semantic similarity distro. > * ctakes-ytex-uima > Includes ytex analysis engines > * ctakes-ytex-uima-res > resources for ytex analysis engines > > Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to > existing projects (don't know where they would fit). > > Best, > > Vijay > > > > > On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <[email protected]> wrote: > > > Hi Pei, > > > > The WSD annotator relies on the semantic similarity component, which > > is a general purpose tool not strictly limited to ctakes or NLP. I > > would like to keep the semantic similarity component 'standalone', > > i.e. with no dependencies on ctakes, and make it redistributable on > > its own. If that is possible as part of ctakes, I'd love to move it. > > If not, I'd leave the semantic similarity and the associated WSD > > annotator on google code. > > > > For those of you who want the back story: > > http://www.biomedcentral.com/1471-2105/13/261 > > http://jamia.bmj.com/content/20/5/882.long > > > > > > -vj > > > > On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei > > <[email protected]> wrote: > > > vj, > > > Were you thinking of contributing the new ytext Word Sense > > Disambiguation component as well- I think that will be really cool. > > > --Pei > > > > > >> -----Original Message----- > > >> From: [email protected] [mailto:[email protected]] On Behalf Of Karthik > > >> Sarma > > >> Sent: Thursday, October 03, 2013 1:05 PM > > >> To: [email protected] > > >> Subject: Re: move ytex annotators to ctakes.apache.org? > > >> > > >> This would be quite valuable -- in particular, ytex's annotation > > database > > >> connection is much easier to use than what ships with cTAKES. There > are > > a > > >> fair number of other advantages, and I think they'd all be very > > valuable! > > >> > > >> > > >> > > >> > > >> > > >> -- > > >> Karthik Sarma > > >> UCLA Medical Scientist Training Program Class of 20?? > > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation > > >> to the House of Delegates of the American Medical Association > > >> [email protected] > > >> gchat: [email protected] > > >> linkedin: www.linkedin.com/in/ksarma > > >> > > >> > > >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <[email protected]> > wrote: > > >> > > >> > Hello All, > > >> > > > >> > I'd like to contribute ytex to ctakes. YTEX's main feature is the > > >> > ability to store *any* ctakes (or uima) annotation in a relational > > >> > database (in a relational format), and the ability to export these > > >> > annotations to ML packages (weka, libsvm, matlab, R). All of this > is > > >> > purely declarative/via configuration. > > >> > > > >> > In addtion, Ytex provides the following: > > >> > * Negation Detection with Negex > > >> > * SegmentRegexAnnotator - section detection with regular expressions > > >> > * NamedEntityRegexAnnotator - named entity detection with regular > > >> > expressions > > >> > * Sentence Splitter - modified ctakes sentence splitter making > > >> > sentence split patterns configurable (not hardcoded to \n) > > >> > > > >> > YTEX currently works with ctakes 2.5; I would like to upgrade it to > > >> > the latest ctakes, and if the community is interested, contribute to > > >> > ctakes.apache.org. > > >> > > > >> > A licensing question: YTEX uses Spring (apache 2.0 license), > Hibernate > > >> > (lgpl 2.1), & weka (gpl). Are there any issues with including > these? > > >> > > > >> > Cheers > > >> > > > >> > vj > > >> > > > >
