Hello All, I've started on the ytex-ctakes port, and have some packaging questions.
* Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies: I understand that we will not ship these jars as part of the ctakes download. Can we bundle the jars and ship them as part of an additional download, available via sourceforge? Hibernate is available via maven central, weka and jdbc not. I have added weka & jdbc drivers as system dependencies. I'm not sure how you collect all the dependencies for shipment, but how do I tell maven not to include these? Is it OK to check weka & jdbc into source control? * desc vs <project>-res What are the guidelines for what goes where? Configuration files are found in both places, whereas data/models are in the -res directory. Ytex has many non-uima config files (hibernate, spring) which should be user-modifiable, and I would put them in the desc directory. However, desc is not in the project classpath (but it is in the classpath for the ctakes distro, e.g. in runctakesCPE.bat). Any reason for this dissonance? I would add desc as a resources directory in the pom. * distribution of umls concept graphs for semantic similarity and word sense disambiguation, ytex provides concept graphs derived from the UMLS. We have a download site that requires UTS login to get these concept graphs ( http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip). I take it I would just create a -res directory and add the concept graphs here, and they would automagically appear in the ctakes-resources zip? * patches to other ctakes projects ytex has some patches to other ctakes annotators for handling edge cases where they throw up with an exception; I will check to see if these changes have already been made. If not, I will file separate Jira tickets for these patches. Also, the CharacterOffsetToLineTokenConverterCtakesImpl needs to be modified to properly handle cases where newlines are in sentences; I will add a patch for that as well. * post download setup ytex provides an ant script to simplify the post download setup (database schema, setup, configuration file generation). Would it be possible to ship ant with the ctakes distro, so that users can execute these scripts? If not, how best to automate setup? I know from experience with earlier versions of ytex that setting up the database schema is error prone, and that this needs to be automated. I was planning on creating the following projects: * ctakes-ytex: Base ytex, includes semantic similarity tools. This has no dependencies on ctakes, and I would create a separate distribution of just this package for a semantic similarity distro. * ctakes-ytex-res Includes concept graphs for semantic similarity. * ctakes-ytex-web Provides User Interface, RESTful, and WebServices interface to semantic similarity service. This has no dependencies on ctakes, and this would be included in the semantic similarity distro. * ctakes-ytex-uima Includes ytex analysis engines * ctakes-ytex-uima-res resources for ytex analysis engines Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to existing projects (don't know where they would fit). Best, Vijay On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <[email protected]> wrote: > Hi Pei, > > The WSD annotator relies on the semantic similarity component, which > is a general purpose tool not strictly limited to ctakes or NLP. I > would like to keep the semantic similarity component 'standalone', > i.e. with no dependencies on ctakes, and make it redistributable on > its own. If that is possible as part of ctakes, I'd love to move it. > If not, I'd leave the semantic similarity and the associated WSD > annotator on google code. > > For those of you who want the back story: > http://www.biomedcentral.com/1471-2105/13/261 > http://jamia.bmj.com/content/20/5/882.long > > > -vj > > On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei > <[email protected]> wrote: > > vj, > > Were you thinking of contributing the new ytext Word Sense > Disambiguation component as well- I think that will be really cool. > > --Pei > > > >> -----Original Message----- > >> From: [email protected] [mailto:[email protected]] On Behalf Of Karthik > >> Sarma > >> Sent: Thursday, October 03, 2013 1:05 PM > >> To: [email protected] > >> Subject: Re: move ytex annotators to ctakes.apache.org? > >> > >> This would be quite valuable -- in particular, ytex's annotation > database > >> connection is much easier to use than what ships with cTAKES. There are > a > >> fair number of other advantages, and I think they'd all be very > valuable! > >> > >> > >> > >> > >> > >> -- > >> Karthik Sarma > >> UCLA Medical Scientist Training Program Class of 20?? > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation > >> to the House of Delegates of the American Medical Association > >> [email protected] > >> gchat: [email protected] > >> linkedin: www.linkedin.com/in/ksarma > >> > >> > >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <[email protected]> wrote: > >> > >> > Hello All, > >> > > >> > I'd like to contribute ytex to ctakes. YTEX's main feature is the > >> > ability to store *any* ctakes (or uima) annotation in a relational > >> > database (in a relational format), and the ability to export these > >> > annotations to ML packages (weka, libsvm, matlab, R). All of this is > >> > purely declarative/via configuration. > >> > > >> > In addtion, Ytex provides the following: > >> > * Negation Detection with Negex > >> > * SegmentRegexAnnotator - section detection with regular expressions > >> > * NamedEntityRegexAnnotator - named entity detection with regular > >> > expressions > >> > * Sentence Splitter - modified ctakes sentence splitter making > >> > sentence split patterns configurable (not hardcoded to \n) > >> > > >> > YTEX currently works with ctakes 2.5; I would like to upgrade it to > >> > the latest ctakes, and if the community is interested, contribute to > >> > ctakes.apache.org. > >> > > >> > A licensing question: YTEX uses Spring (apache 2.0 license), Hibernate > >> > (lgpl 2.1), & weka (gpl). Are there any issues with including these? > >> > > >> > Cheers > >> > > >> > vj > >> > >
