Sure.

Let's settle this issue. Is there anybody who does *not* like the idea of creating a new type "DocumentClass" (derives from Top) with a single string attribute called "label"?

Dima

On 11/20/2012 11:00 AM, Chen, Pei wrote:
If we do decide to create a new type, could we call it something like DocumentClass or 
DocumentClassification and have some attribute(s) called "label"?  Otherwise we may need 
a WSD component to disambiguate "Label" from general doc metadata... :)



On Nov 19, 2012, at 11:25 AM, "Dmitriy Dligach" 
<[email protected]> wrote:

This is a very good point. Also it seems like documents with multiple labels is 
not a scenario that we face every day, so why don't we just create a new type 
(e.g. DocumentLabel) that derives from TOP and use it for a while to see if it 
satisfies our document classification needs?

Thanks,

Dima

On 11/16/2012 09:21 PM, Wu, Stephen T., Ph.D. wrote:
Well, I think the downside to using Pair is that it is not self-documenting.
In other words, everyone who did not see this thread will be faced with the
same problem Dima and I were faced with (and have dealt with in different
ways).  Everyone trying to use the common type system outside of UIMA would
be probably be completely lost.

Multiple labels... DocumentClass = List<Pair>?  Or you could just
instantiate multiple DocumentClass types, and not necessarily have the
Document refer to them...

stephen



I suggest using the current types.

I think if we add a new one, we would still want to handle multiple
classifications, and would still have the downside of having to iterate
through the classifications to find the one of interest.  So I'm not sure how
much we gain by adding a new type.

But you are closer to this than I am so I would go with whatever you recommend
or others doing classification recommend.

-- James


-----Original Message-----
From: [email protected]
[mailto:ctakes-dev-return-874-
[email protected]] On Behalf Of Dmitriy Dligach
Sent: Thursday, November 15, 2012 3:49 PM
To: [email protected]
Subject: Re: new type: document label?

James, thanks. This makes perfect sense.

So what's the conclusion? Can we do with the current types, or do we
still need to create a new one?

Dima

On 11/15/2012 03:43 PM, Masanz, James J. wrote:
Yes, you can put multiple Pair annotations in the CAS.
There is a Pairs (plural) annotation type which is a list (FSArray) of
Pair annotations.
You could have two Pair annotations with
attribute="at_risk_for_early_brca"
value="T"

attribute="alchohol_use"
value="heavy_drinker"

The downside:
You have to iteratate through the Pair annotations to find the one
with the attribute name you want.
The upside: we don't have to create new Annotation types for
everything that might be imagined.
As Stephen points out, not everything in Pairs needs to be a document
class or related to the text within the document. It can be used for
example to keep version information about a pipeline or anything any
annotator wants. A totally made-up example could be
attribute="dictionary_lookup_version"
value="3.2.1"

-- James


-----Original Message-----
From:
[email protected]
[mailto:ctakes-dev-return-869-
[email protected]] On Behalf Of Dmitriy
Dligach
Sent: Thursday, November 15, 2012 1:03 PM
To: [email protected]
Subject: Re: new type: document label?

Chen brings up a good point. But can't we solve this problem by
creating multiple Pair annotations in the CAS?

Dima

On 11/15/2012 01:52 PM, Lin, Chen wrote:
I am curious to know if Pair allows multiple document level labels
for
a single doc. It is possible that a single set of documents be used
in multiple classification tasks.
For example, in one task a document may be labeled as "positive" or
"negative", in another task this same doc may be labeled as "high",
"moderate" or "low".  Many thanks!
Best,
Chen

-----Original Message-----
From: Dmitriy Dligach [mailto:[email protected]]
Sent: Thursday, November 15, 2012 1:46 PM
To: [email protected]
Subject: Re: new type: document label?

Thank you, James.

So, in general did you envision this type of use for Pair:

Pair.attribute -> "document_label"
Pair.value -> "positive"

I think this may work.

Dima

On 11/15/2012 10:22 AM, Masanz, James J. wrote:
Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for
such document-level properties.
Would that suit your need?

-- James

-----Original Message-----
From:
[email protected]
[mailto:ctakes-dev-return-854-
[email protected]] On Behalf Of Dmitriy
Dligach
Sent: Thursday, November 15, 2012 9:16 AM
To: cTAKES Dev list @ ASF
Subject: new type: document label?

We've recently been using cTAKES more and more for document-level
classification (e.g. phenotyping). Would it make sense to add a
new type (that would derive from TOP) to store the label for a
document?
I know we currently have a doc id for each document, but having
the label type would simplify a lot of things (e.g. debugging).

Thanks,

Dima

Reply via email to