I think Pair works, technically, because it inherits from TOP and processing
is assumed to happen for each document separately.  However, because it
inherits from Top, there's no way to ensure that the stuff you're putting
there is all about the document -- you might have Pairs of lots of other
stuff you don't care about, and have to iterate through that to get the
document class.

Good point, although the Pair type doesn't seem to be heavily used now.


Here's what I propose:
Add a type (inherits from Top): DocumentClass - String
Add a feature to Document (inherits from Annotation): documentId - of type
DocumentID
Add a feature to Document: documentClass - of type DocumentClass
Add a feature to Document: metadata - of type Metadata

This will be important if we ever run patient-centric pipelines, rather than
document-centric ones.

Looks good.

Dim

stephen


On 11/15/12 9:22 AM, "Masanz, James J." <[email protected]> wrote:

Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for such
document-level properties.
Would that suit your need?

-- James

-----Original Message-----
From: [email protected]
[mailto:ctakes-dev-return-854-
[email protected]] On Behalf Of Dmitriy Dligach
Sent: Thursday, November 15, 2012 9:16 AM
To: cTAKES Dev list @ ASF
Subject: new type: document label?

We've recently been using cTAKES more and more for document-level
classification (e.g. phenotyping). Would it make sense to add a new type
(that would derive from TOP) to store the label for a document? I know
we currently have a doc id for each document, but having the label type
would simplify a lot of things (e.g. debugging).

Thanks,

Dima

Reply via email to