Hi Vijay, > I have yet to run across clinical text from a real EMR where newlines > represent the end of a sentence
Since James pointed out this possibility a couple weeks ago, I have kept my eyes open. The problem is pretty ubiquitous in a corpus that I'm working with right now. I just opened the first note and gave it a count ... 95 lines total, 9 are sentence/phrase (lacking punctuation) endings. This is not including lists, which comprise about half of the note. One possible conjoinment was "Will consider [...] biopsy\nGiven [...]". Depending upon how cTakes deals with it, the meaning could change drastically. > I believe cTAKES absolutely has to support sentences with newlines within them Yes, cTakes should do so, but I hope that you aren't suggesting that it only support such a structure. Where is that easy button? -----Original Message----- From: vijay garla [mailto:vnga...@gmail.com] Sent: Thursday, February 06, 2014 10:31 AM To: dev@ctakes.apache.org Cc: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org; vlad.valtchi...@gmail.com Subject: Re: YTEX cTAKES 3.1.1 ready I believe it is worth migrating to trunk. Note that the sentence detector is also complementary - the existing ctakes sentence detector is unchanged - users can choose which sentence detector to use. There are changes to assertion & dependency parsing to support sentences without newlines, and that works with both sentence detectors. I believe cTAKES absolutely has to support sentences with newlines within them - I have yet to run across clinical text from a real EMR where newlines represent the end of a sentence - the changes to assertion & dependency parsing will have to be done at some point. -vj On Thu, Feb 6, 2014 at 10:19 AM, Chen, Pei <pei.c...@childrens.harvard.edu>wrote: > VJ, > Aside from the changes to the existing cTAKES code (sentence detector, > etc.) [which we could leave out if it's still being debated], Do you > think it's worth migrating the ytex code to trunk at this point? > As you mentioned earlier, it's largely complementary. > [I was just thinking of saving effort to maintain the separate branch > and for simplicity for dev...] > > --Pei > > > -----Original Message----- > > From: vijay garla [mailto:vnga...@gmail.com] > > Sent: Wednesday, February 05, 2014 9:30 PM > > To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org; > > vlad.valtchi...@gmail.com > > Subject: Re: YTEX cTAKES 3.1.1 ready > > > > Hi Vlad, > > > > I Updated the umls install guide; see > > https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1 > > > > I would prefer to add the docs in the ctakes confluence, but as far > > as I > can > > tell, I don't have write access there - can somebody give me write > privileges > > on the ctakes confluence site? > > > > There was a bug in the umls install; copy > > https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes- > > ytex/scripts/data/build.xmlover > > the corresponding file in your ctakes-3.1.2 install > > (CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set. > > The import is currently running on the UMLS 2013AA (I assume this > > will > complete > > without issues as long as the umls schema hasn't changed from 2012). > > > > what trial and error did you have to go through to build the distro? > > > > -vj > > > > > > On Wed, Feb 5, 2014 at 5:33 PM, vijay garla <vnga...@gmail.com> wrote: > > > > > Hi Vlad, > > > > > > sorry that the instructions aren't clear. > > > > > > re 1) What I am trying to say is install > > > apache-ctakes-3.2.0-snapshot as usual (this is unchanged from > > > 3.1.1). After that you still have to apply the lib and resources > > > (these are things that cannot be distributed via apache). > > > > > > re 2) Yes, I need to update those docs. Hopefully will get to > > > that at some point. However, I assume you already have a UMLS DB > > > (also assume SQL Server). If you can't/don't want to use your > > > existing umls DB, please tell me. The I'll priortize upgrading > > > the doc on importing the umls tables (the scripts are there). > > > > > > best, > > > > > > VJ > > > > > > > > > On Wed, Feb 5, 2014 at 4:44 PM, <vlad.valtchi...@gmail.com> wrote: > > > > > >> Hi VJ- > > >> > > >> so, with trial and error were able to make the distribution and > > >> now have the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive. > > >> > > >> Here's what's unclear. > > >> > > >> 1. Is now this the only (combined) thing that you need for ctakes > > >> 3.1.1 + Ytex? > > >> the current documentation (https://code.google.com/p/yte > > >> x/wiki/Installation_cTAKES_3_1?ts=1388793998&updated=Instal > > >> lation_cTAKES_3_1) > > >> which most probably is outdated, talks about installing cTakes > > >> 3.1.1 first and then applying 2 SNAPSHOT archives (downloadable) > > >> , lib and resources. > > >> This is a confusion point. > > >> > > >> 2. The directions to import UMLS subset are then outdated as well. > > >> Maybe one should use the old version (ctakes 2.5 and ytex 0.8) to > > >> import the RRF files for the UMLS subset and then just use the > > >> resulting db. Thoughts? > > >> > > >> Thanks, > > >> Vlad Valtchinov > > >> Brigham Rad > > >> > > >> > > >> On Thursday, January 30, 2014 5:17:43 PM UTC-5, vijay garla wrote: > > >> > > >>> Hi Vlad, > > >>> > > >>> > > >> All of ytex has been moved into ctakes, it is currently in a > > >> branch ( > > >>> https://svn.apache.org/repos/asf/ctakes/branches/ytex). You > > >>> don't have to install ytex-0.8 - instead you will have to build > > >>> and install from the ytex branch to create your own > > >>> distribution. Steps > 2 & 3 > > are correct. > > >>> > > >>> Although it is a pain, if you have the jdk, maven, and svn, you > > >>> can easily build your own distro: > > >>> * open a command prompt > > >>> * make sure jdk, maven, and svn are in your path > > >>> * cd to some directory where you want to check stuff out (I like > > >>> c:\temp) > > >>> * run the following commands > > >>> rmdir /s /q ctakes > > >>> svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex > > >>> ctakes cd ctakes mvn clean install -DskipTests > > >>> > > >>> And you will have the ctakes (with ytex) distro in > > >>> ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-b > > >>> in.z > > >>> ip > > >>> > > >>> What is the process for getting the ytex branch merged into trunk? > > >>> As I mentioned, there are very few changes to other ctakes > > >>> classes/types - this should be completely complementary and not > > >>> affect any existing ctakes functionality. > > >>> > > >>> -vj > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> On Thu, Jan 30, 2014 at 4:56 PM, <vlad.va...@gmail.com> wrote: > > >>> > > >>>> Hi VJ-- > > >>>> > > >>>> this is great!! Thanks for all the hard work on it! > > >>>> > > >>>> We're starting to look into the new install. For now we're > > >>>> trying the binaries out. > > >>>> > > >>>> There were these questions about the proper install steps: > > >>>> > > >>>> 1. Do we first install ytex-0.8 2. Then install the new cTakes > > >>>> 3.1.1 instance and also apply the SNAPSHOT lib and resources > > >>>> zips 3. Work our way to install the UMLS ontologies in the db > > >>>> > > >>>> Its is not entirely clear from the new document ( > > >>>> https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_ > > >>>> 1?ts=1388793998&updated=Installation_cTAKES_3_1) > > >>>> if there's still need to install ytex-0.8, or YTEX has been > > >>>> entirely merged into cTakes? > > >>>> > > >>>> If the last statement is correct, there are missing parts in > > >>>> i.e the UMLS install steps that are linked from the new ctakes > > >>>> 3.1.1 > > document. > > >>>> > > >>>> Thanks, > > >>>> vlad > > >>>> > > >>>> > > >>>> On Friday, January 3, 2014 10:21:52 PM UTC-5, vijay garla wrote: > > >>>>> > > >>>>> Hello All, > > >>>>> > > >>>>> I have finished an initial cut at the port of YTEX to cTAKES 3.1.1. > > >>>>> Most of the YTEX functionality has been ported and integrated > > >>>>> with cTAKES, and I've tested with MySQL and MS SQL Server > > >>>>> (oracle > > tests pending). > > >>>>> > > >>>>> Most of the changes were made in new projects - very little > > >>>>> existing cTAKES code has been modified. The only non-trivial > > >>>>> changes are in > > >>>>> /ctakes- > > assertion/src/main/java/org/apache/ctakes/assertion/medfac > > >>>>> ts/i2b2/api > > >>>>> - here I modified > > >>>>> CharacterOffsetToLineTokenConverterCtakesImpl & > > >>>>> SingleDocumentProcessorCtakes to deal with newlines within > > >>>>> sentences correctly. Can somebody take a look at the changes > > >>>>> in > the > > ytex branch? > > >>>>> > > >>>>> I believe that the branch https://svn.apache.org/ > > >>>>> repos/asf/ctakes/branches/ytex is ready to be merged into > > >>>>> ctakes trunk, but would like other users to test it as well. > > >>>>> Questions: > > >>>>> > > >>>>> * How can I distribute the ctakes binary distribution to ytex > > >>>>> users before the merge? Can we make the branch build available > > >>>>> somewhere? The binary distribution is too large to host on > > >>>>> the ytex google code site (max > > >>>>> 200 MB) > > >>>>> * Non-ASF libraries - I have segregated these out into their > > >>>>> own zip file that can be distributed via sourceforge. As a > > >>>>> stopgap, I can upload this to the ytex google code site, but > > >>>>> would prefer to upload to sourceforge. > > >>>>> * UMLS Derivatives - Ditto for these - would like to move to > > >>>>> sourceforge. > > >>>>> * Documentation - How can I update the confluence docs? I > > >>>>> would migrate the documentation from the google code website. > > >>>>> > > >>>>> Here the installation instructions (putting the wagon in front > > >>>>> of the horse ...) > > >>>>> > > >>>>> https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1? > > >>>>> ts=1388793998&updated=Installation_cTAKES_3_1 > > >>>>> > > >>>>> Best, > > >>>>> > > >>>>> VJ > > >>>>> > > >>>>> > > >>>>> -- > > >>>> You received this message because you are subscribed to the > > >>>> Google Groups "ytex-users" group. > > >>>> To unsubscribe from this group and stop receiving emails from > > >>>> it, send an email to ytex-users+...@googlegroups.com. > > >>>> To post to this group, send email to ytex-...@googlegroups.com. > > >>>> To view this discussion on the web visit > > >>>> https://groups.google.com/d/ > > >>>> msgid/ytex-users/70f03a80-ce1a-4c0e-b35d-5116d1c93ea0% > > >>>> 40googlegroups.com. > > >>>> > > >>>> For more options, visit https://groups.google.com/groups/opt_out. > > >>>> > > >>> > > >>> -- > > >> You received this message because you are subscribed to the > > >> Google Groups "ytex-users" group. > > >> To unsubscribe from this group and stop receiving emails from it, > > >> send an email to ytex-users+unsubscr...@googlegroups.com. > > >> To post to this group, send email to ytex-us...@googlegroups.com. > > >> To view this discussion on the web visit > > >> https://groups.google.com/d/msgid/ytex-users/bc3bd705-55d2-4acd- > > a273- > > >> a3b1a7b36241%40googlegroups.com > > >> . > > >> > > >> For more options, visit https://groups.google.com/groups/opt_out. > > >> > > > > > > >