+1 to have a pointer back to the BaseToken(s) rather than a | String (so we could get back the spans and other info if needed). I think the atom will be slightly different, take for example: Perhaps with an example: Sentence/LookupWindow: "alcoholic liver disease was acute." originalText: "disease acute" [New feature to store the Tokens that were matched due to the permutations?] UmlsConcept.cui: C0001314 UmlsConcept.preferredText: "Acute Disease" [New feature to store the atom/text returned by the UMLS CUI]
I also ran into a similar case where I wish IdentifiedAnnotation.segmentID/SentenceID was the actual Segment type and not a String. This is just my 2 cents... open to ideas though. --Pei > -----Original Message----- > From: Richard Eckart de Castilho [mailto:[email protected]] > Sent: Wednesday, October 02, 2013 3:19 AM > To: [email protected] > Subject: Re: CTAKES-248- include original covered text of NEs which can't be > recovered post if NE is from a disjoint span > > What benefit would it have to store a string with some separation character > (which may mean that the separation character in the elements may need to > be escaped), over using a feature of type FSArray<Token> pointing to the > original segments? > > Not sure if that is what Karthik meant when referring to fetching the > matched atom. > > -- Richard > > On 02.10.2013, at 01:46, Karthik Sarma <[email protected]> wrote: > > > Hmm, couldn't you just fetch the matched atom and use that? Should be > > the same information (without, I suppose, the original ordering and split). > > > > -- > > Karthik Sarma > > UCLA Medical Scientist Training Program Class of 20?? > > Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation > > to the House of Delegates of the American Medical Association > > [email protected] > > gchat: [email protected] > > linkedin: www.linkedin.com/in/ksarma > > > > > > On Tue, Oct 1, 2013 at 12:37 PM, Masanz, James J. > <[email protected]>wrote: > > > >> Yes, this would help address that multiple permutations example. The > >> new getOriginalText method would return something like > >> "Acute|Disease". Right now I'm thinking of just using vertical bar > >> as delimiter, to start with at least, but think it should be configurable. > >> > >> -----Original Message----- > >> From: [email protected] > [mailto: > >> [email protected]] On > Behalf Of > >> Chen, Pei > >> Sent: Tuesday, October 01, 2013 9:38 AM > >> To: [email protected] > >> Subject: CTAKES-248- include original covered text of NEs which can't > >> be recovered post if NE is from a disjoint span > >> > >> This sounds pretty cool. > >> James, will this address the multiple permutations lookup example: > >> "Acute alcoholic liver disease." There is a cui: C0001314: Acute > >> Disease, but if you getCoveredText(), on the UMLSConcept, you would > >> actually get the same "Acute alcoholic liver disease" instead of "Acute > Disease". > >> So, there is a new field called getOriginalText() that matched the hit? > >> > >>> -----Original Message----- > >>> From: [email protected] [mailto:[email protected]] > >>> Sent: Monday, September 30, 2013 5:49 PM > >>> To: [email protected] > >>> Subject: svn commit: r1527792 - /ctakes/trunk/ctakes-type- > >>> > system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys > >>> t > >>> em.xml > >>> > >>> Author: james-masanz > >>> Date: Mon Sep 30 21:48:01 2013 > >>> New Revision: 1527792 > >>> > >>> URL: http://svn.apache.org/r1527792 > >>> Log: > >>> CTAKES-248 - for named entities, since the annotation just has the > >> begin and > >>> end offset, it is requested to have a way to get the original > >>> covered > >> text > >>> (especially for disjoint spans) so it is possible to know which > >>> words in > >> the > >>> covered text were actually used in the matching to the dictionary > >>> entry > >>> > >>> Modified: > >>> ctakes/trunk/ctakes-type- > >>> > system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys > >>> t > >>> em.xml > >>> > >>> Modified: ctakes/trunk/ctakes-type- > >>> > system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys > >>> t > >>> em.xml > >>> URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-type- > >>> > system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys > >>> t em.xml?rev=1527792&r1=1527791&r2=1527792&view=diff > >>> > ========================================================== > >>> ==================== > >>> Binary files - no diff available.
