I think text formatting is a natural for being turned into annotations. Just 
one example - some people use formatting to indicate section headings and there 
could be a sectionizer that uses rtf tags as-is to determine sections, or uses 
them as features at least.

-- James

> -----Original Message-----
> From: dev-return-1935-Masanz.James=mayo....@ctakes.apache.org [mailto:dev-
> return-1935-Masanz.James=mayo....@ctakes.apache.org] On Behalf Of Pei Chen
> Sent: Tuesday, September 03, 2013 9:10 AM
> To: u...@ctakes.apache.org; dev@ctakes.apache.org
> Subject: Re: RTF Annotator?
> 
> Hi David,
> There is work being done on Tika/OCR integration, but I am not aware of
> any cTAKES RTF Annotators.
> What does others think? Having additional meta data such does sound very
> interesting especially with mark-ups (bold/italics) and semi-structured
> data such as tables...
> 
> --Pei
> 
> 
> On Sun, Sep 1, 2013 at 5:41 PM, David Kincaid
> <kincaid.d...@gmail.com>wrote:
> 
> > Before I embark on building an RTF annotator I thought I'd ask around
> > a bit to see if anyone had built such a thing. Most of the medical
> > notes I have to handle are in RTF format. I can pretty easily extract
> > the text only using something like Apache TIka, but there is important
> > information in the formatting as well (bold, italic, font sizes,
> > centering, tables, etc) that I'd like to use. Is anyone aware of a UIMA
> annotator that does this already?
> >
> > Thanks,
> >
> > Dave Kincaid
> >

Reply via email to