Hey Paul,

In my data-mining $dayjob I do a fair amount of annotation of text
with attributes, similar to what you're talking about.  People in this
field tend to call it "stand-off annotation", which means it's stored
out-of-band, as opposed to "in-line markup" like vanilla XML.

The precedence rules and types of annotation you've defined seem a bit
arbitrary though - you might want to have a look at UIMA and GATE,
both of which define structures like yours but with a few differences:

http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/overview_and_setup/overview_and_setup.html#ugr.ovv.conceptual.representing_results_in_cas

http://www.gate.ac.uk/releases/gate-4.0-build2752-ALL/doc/javadoc/gate/TextualDocument.html

 -Ken

On Fri, Jan 30, 2009 at 4:25 PM, Paul LeoNerd Evans
<leon...@leonerd.org.uk> wrote:
> On Fri, Jan 30, 2009 at 02:08:24PM -0800, Bill Ward wrote:
>> Or String::Substrate?  The meaning of "substrate" doesn't really fit
>> here but it's so close to SubStrAttr that I bet you could get away
>> with it, with a suitable comment explaining the name :)
>
> I can't help thinking we're getting a bit side-tracked by the name here.
>
> There's a lot of interesting API in the code, I feel the name is
> somewhat overshadowing any other discussion on the API design or other
> details...
>
> --
> Paul "LeoNerd" Evans
>
> leon...@leonerd.org.uk     |    CPAN ID: PEVANS
> srand($,=" ");print sort{rand>0.5}grep{0.8>rand}qw(another Just hacker of 
> Perl)
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iD8DBQFJg353vLS2TC8cBo0RAnREAKDa9UFgoIQ3Cj5dKkuY9sVCR+hzOwCcDWBy
> vel3GIHnz0SZhRSXVSbX/7g=
> =wIpu
> -----END PGP SIGNATURE-----
>
>

Reply via email to