I think text formatting is a natural for being turned into annotations. Just one example - some people use formatting to indicate section headings and there could be a sectionizer that uses rtf tags as-is to determine sections, or uses them as features at least.
-- James > -----Original Message----- > From: dev-return-1935-Masanz.James=mayo....@ctakes.apache.org [mailto:dev- > return-1935-Masanz.James=mayo....@ctakes.apache.org] On Behalf Of Pei Chen > Sent: Tuesday, September 03, 2013 9:10 AM > To: u...@ctakes.apache.org; dev@ctakes.apache.org > Subject: Re: RTF Annotator? > > Hi David, > There is work being done on Tika/OCR integration, but I am not aware of > any cTAKES RTF Annotators. > What does others think? Having additional meta data such does sound very > interesting especially with mark-ups (bold/italics) and semi-structured > data such as tables... > > --Pei > > > On Sun, Sep 1, 2013 at 5:41 PM, David Kincaid > <kincaid.d...@gmail.com>wrote: > > > Before I embark on building an RTF annotator I thought I'd ask around > > a bit to see if anyone had built such a thing. Most of the medical > > notes I have to handle are in RTF format. I can pretty easily extract > > the text only using something like Apache TIka, but there is important > > information in the formatting as well (bold, italic, font sizes, > > centering, tables, etc) that I'd like to use. Is anyone aware of a UIMA > annotator that does this already? > > > > Thanks, > > > > Dave Kincaid > >