I've been trying to do document-level classification too, and was resorting to cleartk's types... I hadn't even thought of Pair, for some reason!
I think Pair works, technically, because it inherits from TOP and processing is assumed to happen for each document separately. However, because it inherits from Top, there's no way to ensure that the stuff you're putting there is all about the document -- you might have Pairs of lots of other stuff you don't care about, and have to iterate through that to get the document class. Here's what I propose: Add a type (inherits from Top): DocumentClass - String Add a feature to Document (inherits from Annotation): documentId - of type DocumentID Add a feature to Document: documentClass - of type DocumentClass Add a feature to Document: metadata - of type Metadata This will be important if we ever run patient-centric pipelines, rather than document-centric ones. stephen On 11/15/12 9:22 AM, "Masanz, James J." <[email protected]> wrote: > Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for such > document-level properties. > Would that suit your need? > > -- James > >> -----Original Message----- >> From: [email protected] >> [mailto:ctakes-dev-return-854- >> [email protected]] On Behalf Of Dmitriy Dligach >> Sent: Thursday, November 15, 2012 9:16 AM >> To: cTAKES Dev list @ ASF >> Subject: new type: document label? >> >> We've recently been using cTAKES more and more for document-level >> classification (e.g. phenotyping). Would it make sense to add a new type >> (that would derive from TOP) to store the label for a document? I know >> we currently have a doc id for each document, but having the label type >> would simplify a lot of things (e.g. debugging). >> >> Thanks, >> >> Dima >
