Well, I think the downside to using Pair is that it is not self-documenting.
In other words, everyone who did not see this thread will be faced with the
same problem Dima and I were faced with (and have dealt with in different
ways).  Everyone trying to use the common type system outside of UIMA would
be probably be completely lost.

Multiple labels... DocumentClass = List<Pair>?  Or you could just
instantiate multiple DocumentClass types, and not necessarily have the
Document refer to them...

stephen



> I suggest using the current types.
>
> I think if we add a new one, we would still want to handle multiple
> classifications, and would still have the downside of having to iterate
> through the classifications to find the one of interest.  So I'm not sure how
> much we gain by adding a new type.
> 
> But you are closer to this than I am so I would go with whatever you recommend
> or others doing classification recommend.
> 
> -- James
> 
> 
>> -----Original Message-----
>> From: [email protected]
>> [mailto:ctakes-dev-return-874-
>> [email protected]] On Behalf Of Dmitriy Dligach
>> Sent: Thursday, November 15, 2012 3:49 PM
>> To: [email protected]
>> Subject: Re: new type: document label?
>> 
>> James, thanks. This makes perfect sense.
>> 
>> So what's the conclusion? Can we do with the current types, or do we
>> still need to create a new one?
>> 
>> Dima
>> 
>> On 11/15/2012 03:43 PM, Masanz, James J. wrote:
>>> Yes, you can put multiple Pair annotations in the CAS.
>>> There is a Pairs (plural) annotation type which is a list (FSArray) of
>> Pair annotations.
>>> 
>>> You could have two Pair annotations with
>>> attribute="at_risk_for_early_brca"
>>> value="T"
>>> 
>>> attribute="alchohol_use"
>>> value="heavy_drinker"
>>> 
>>> The downside:
>>> You have to iteratate through the Pair annotations to find the one
>> with the attribute name you want.
>>> The upside: we don't have to create new Annotation types for
>> everything that might be imagined.
>>> 
>>> As Stephen points out, not everything in Pairs needs to be a document
>>> class or related to the text within the document. It can be used for
>> example to keep version information about a pipeline or anything any
>> annotator wants. A totally made-up example could be
>> attribute="dictionary_lookup_version"
>>> value="3.2.1"
>>> 
>>> -- James
>>> 
>>> 
>>>> -----Original Message-----
>>>> From:
>>>> [email protected]
>>>> [mailto:ctakes-dev-return-869-
>>>> [email protected]] On Behalf Of Dmitriy
>>>> Dligach
>>>> Sent: Thursday, November 15, 2012 1:03 PM
>>>> To: [email protected]
>>>> Subject: Re: new type: document label?
>>>> 
>>>> Chen brings up a good point. But can't we solve this problem by
>>>> creating multiple Pair annotations in the CAS?
>>>> 
>>>> Dima
>>>> 
>>>> On 11/15/2012 01:52 PM, Lin, Chen wrote:
>>>>> I am curious to know if Pair allows multiple document level labels
>>>>> for
>>>> a single doc. It is possible that a single set of documents be used
>>>> in multiple classification tasks.
>>>>> For example, in one task a document may be labeled as "positive" or
>>>> "negative", in another task this same doc may be labeled as "high",
>>>> "moderate" or "low".  Many thanks!
>>>>> Best,
>>>>> Chen
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Dmitriy Dligach [mailto:[email protected]]
>>>>> Sent: Thursday, November 15, 2012 1:46 PM
>>>>> To: [email protected]
>>>>> Subject: Re: new type: document label?
>>>>> 
>>>>> Thank you, James.
>>>>> 
>>>>> So, in general did you envision this type of use for Pair:
>>>>> 
>>>>> Pair.attribute -> "document_label"
>>>>> Pair.value -> "positive"
>>>>> 
>>>>> I think this may work.
>>>>> 
>>>>> Dima
>>>>> 
>>>>> On 11/15/2012 10:22 AM, Masanz, James J. wrote:
>>>>>> Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for
>>>> such document-level properties.
>>>>>> Would that suit your need?
>>>>>> 
>>>>>> -- James
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From:
>>>>>>> [email protected]
>>>>>>> [mailto:ctakes-dev-return-854-
>>>>>>> [email protected]] On Behalf Of Dmitriy
>>>>>>> Dligach
>>>>>>> Sent: Thursday, November 15, 2012 9:16 AM
>>>>>>> To: cTAKES Dev list @ ASF
>>>>>>> Subject: new type: document label?
>>>>>>> 
>>>>>>> We've recently been using cTAKES more and more for document-level
>>>>>>> classification (e.g. phenotyping). Would it make sense to add a
>>>>>>> new type (that would derive from TOP) to store the label for a
>> document?
>>>>>>> I know we currently have a doc id for each document, but having
>>>>>>> the label type would simplify a lot of things (e.g. debugging).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Dima
> 

Reply via email to