If we do decide to create a new type, could we call it something like 
DocumentClass or DocumentClassification and have some attribute(s) called 
"label"?  Otherwise we may need a WSD component to disambiguate "Label" from 
general doc metadata... :)



On Nov 19, 2012, at 11:25 AM, "Dmitriy Dligach" 
<[email protected]> wrote:

> This is a very good point. Also it seems like documents with multiple labels 
> is not a scenario that we face every day, so why don't we just create a new 
> type (e.g. DocumentLabel) that derives from TOP and use it for a while to see 
> if it satisfies our document classification needs?
> 
> Thanks,
> 
> Dima
> 
> On 11/16/2012 09:21 PM, Wu, Stephen T., Ph.D. wrote:
>> Well, I think the downside to using Pair is that it is not self-documenting.
>> In other words, everyone who did not see this thread will be faced with the
>> same problem Dima and I were faced with (and have dealt with in different
>> ways).  Everyone trying to use the common type system outside of UIMA would
>> be probably be completely lost.
>> 
>> Multiple labels... DocumentClass = List<Pair>?  Or you could just
>> instantiate multiple DocumentClass types, and not necessarily have the
>> Document refer to them...
>> 
>> stephen
>> 
>> 
>> 
>>> I suggest using the current types.
>>> 
>>> I think if we add a new one, we would still want to handle multiple
>>> classifications, and would still have the downside of having to iterate
>>> through the classifications to find the one of interest.  So I'm not sure 
>>> how
>>> much we gain by adding a new type.
>>> 
>>> But you are closer to this than I am so I would go with whatever you 
>>> recommend
>>> or others doing classification recommend.
>>> 
>>> -- James
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:ctakes-dev-return-874-
>>>> [email protected]] On Behalf Of Dmitriy Dligach
>>>> Sent: Thursday, November 15, 2012 3:49 PM
>>>> To: [email protected]
>>>> Subject: Re: new type: document label?
>>>> 
>>>> James, thanks. This makes perfect sense.
>>>> 
>>>> So what's the conclusion? Can we do with the current types, or do we
>>>> still need to create a new one?
>>>> 
>>>> Dima
>>>> 
>>>> On 11/15/2012 03:43 PM, Masanz, James J. wrote:
>>>>> Yes, you can put multiple Pair annotations in the CAS.
>>>>> There is a Pairs (plural) annotation type which is a list (FSArray) of
>>>> Pair annotations.
>>>>> You could have two Pair annotations with
>>>>> attribute="at_risk_for_early_brca"
>>>>> value="T"
>>>>> 
>>>>> attribute="alchohol_use"
>>>>> value="heavy_drinker"
>>>>> 
>>>>> The downside:
>>>>> You have to iteratate through the Pair annotations to find the one
>>>> with the attribute name you want.
>>>>> The upside: we don't have to create new Annotation types for
>>>> everything that might be imagined.
>>>>> As Stephen points out, not everything in Pairs needs to be a document
>>>>> class or related to the text within the document. It can be used for
>>>> example to keep version information about a pipeline or anything any
>>>> annotator wants. A totally made-up example could be
>>>> attribute="dictionary_lookup_version"
>>>>> value="3.2.1"
>>>>> 
>>>>> -- James
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From:
>>>>>> [email protected]
>>>>>> [mailto:ctakes-dev-return-869-
>>>>>> [email protected]] On Behalf Of Dmitriy
>>>>>> Dligach
>>>>>> Sent: Thursday, November 15, 2012 1:03 PM
>>>>>> To: [email protected]
>>>>>> Subject: Re: new type: document label?
>>>>>> 
>>>>>> Chen brings up a good point. But can't we solve this problem by
>>>>>> creating multiple Pair annotations in the CAS?
>>>>>> 
>>>>>> Dima
>>>>>> 
>>>>>> On 11/15/2012 01:52 PM, Lin, Chen wrote:
>>>>>>> I am curious to know if Pair allows multiple document level labels
>>>>>>> for
>>>>>> a single doc. It is possible that a single set of documents be used
>>>>>> in multiple classification tasks.
>>>>>>> For example, in one task a document may be labeled as "positive" or
>>>>>> "negative", in another task this same doc may be labeled as "high",
>>>>>> "moderate" or "low".  Many thanks!
>>>>>>> Best,
>>>>>>> Chen
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Dmitriy Dligach [mailto:[email protected]]
>>>>>>> Sent: Thursday, November 15, 2012 1:46 PM
>>>>>>> To: [email protected]
>>>>>>> Subject: Re: new type: document label?
>>>>>>> 
>>>>>>> Thank you, James.
>>>>>>> 
>>>>>>> So, in general did you envision this type of use for Pair:
>>>>>>> 
>>>>>>> Pair.attribute -> "document_label"
>>>>>>> Pair.value -> "positive"
>>>>>>> 
>>>>>>> I think this may work.
>>>>>>> 
>>>>>>> Dima
>>>>>>> 
>>>>>>> On 11/15/2012 10:22 AM, Masanz, James J. wrote:
>>>>>>>> Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for
>>>>>> such document-level properties.
>>>>>>>> Would that suit your need?
>>>>>>>> 
>>>>>>>> -- James
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From:
>>>>>>>>> [email protected]
>>>>>>>>> [mailto:ctakes-dev-return-854-
>>>>>>>>> [email protected]] On Behalf Of Dmitriy
>>>>>>>>> Dligach
>>>>>>>>> Sent: Thursday, November 15, 2012 9:16 AM
>>>>>>>>> To: cTAKES Dev list @ ASF
>>>>>>>>> Subject: new type: document label?
>>>>>>>>> 
>>>>>>>>> We've recently been using cTAKES more and more for document-level
>>>>>>>>> classification (e.g. phenotyping). Would it make sense to add a
>>>>>>>>> new type (that would derive from TOP) to store the label for a
>>>> document?
>>>>>>>>> I know we currently have a doc id for each document, but having
>>>>>>>>> the label type would simplify a lot of things (e.g. debugging).
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Dima
> 

Reply via email to