If we do decide to create a new type, could we call it something like DocumentClass or DocumentClassification and have some attribute(s) called "label"? Otherwise we may need a WSD component to disambiguate "Label" from general doc metadata... :)
On Nov 19, 2012, at 11:25 AM, "Dmitriy Dligach" <[email protected]> wrote: > This is a very good point. Also it seems like documents with multiple labels > is not a scenario that we face every day, so why don't we just create a new > type (e.g. DocumentLabel) that derives from TOP and use it for a while to see > if it satisfies our document classification needs? > > Thanks, > > Dima > > On 11/16/2012 09:21 PM, Wu, Stephen T., Ph.D. wrote: >> Well, I think the downside to using Pair is that it is not self-documenting. >> In other words, everyone who did not see this thread will be faced with the >> same problem Dima and I were faced with (and have dealt with in different >> ways). Everyone trying to use the common type system outside of UIMA would >> be probably be completely lost. >> >> Multiple labels... DocumentClass = List<Pair>? Or you could just >> instantiate multiple DocumentClass types, and not necessarily have the >> Document refer to them... >> >> stephen >> >> >> >>> I suggest using the current types. >>> >>> I think if we add a new one, we would still want to handle multiple >>> classifications, and would still have the downside of having to iterate >>> through the classifications to find the one of interest. So I'm not sure >>> how >>> much we gain by adding a new type. >>> >>> But you are closer to this than I am so I would go with whatever you >>> recommend >>> or others doing classification recommend. >>> >>> -- James >>> >>> >>>> -----Original Message----- >>>> From: [email protected] >>>> [mailto:ctakes-dev-return-874- >>>> [email protected]] On Behalf Of Dmitriy Dligach >>>> Sent: Thursday, November 15, 2012 3:49 PM >>>> To: [email protected] >>>> Subject: Re: new type: document label? >>>> >>>> James, thanks. This makes perfect sense. >>>> >>>> So what's the conclusion? Can we do with the current types, or do we >>>> still need to create a new one? >>>> >>>> Dima >>>> >>>> On 11/15/2012 03:43 PM, Masanz, James J. wrote: >>>>> Yes, you can put multiple Pair annotations in the CAS. >>>>> There is a Pairs (plural) annotation type which is a list (FSArray) of >>>> Pair annotations. >>>>> You could have two Pair annotations with >>>>> attribute="at_risk_for_early_brca" >>>>> value="T" >>>>> >>>>> attribute="alchohol_use" >>>>> value="heavy_drinker" >>>>> >>>>> The downside: >>>>> You have to iteratate through the Pair annotations to find the one >>>> with the attribute name you want. >>>>> The upside: we don't have to create new Annotation types for >>>> everything that might be imagined. >>>>> As Stephen points out, not everything in Pairs needs to be a document >>>>> class or related to the text within the document. It can be used for >>>> example to keep version information about a pipeline or anything any >>>> annotator wants. A totally made-up example could be >>>> attribute="dictionary_lookup_version" >>>>> value="3.2.1" >>>>> >>>>> -- James >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: >>>>>> [email protected] >>>>>> [mailto:ctakes-dev-return-869- >>>>>> [email protected]] On Behalf Of Dmitriy >>>>>> Dligach >>>>>> Sent: Thursday, November 15, 2012 1:03 PM >>>>>> To: [email protected] >>>>>> Subject: Re: new type: document label? >>>>>> >>>>>> Chen brings up a good point. But can't we solve this problem by >>>>>> creating multiple Pair annotations in the CAS? >>>>>> >>>>>> Dima >>>>>> >>>>>> On 11/15/2012 01:52 PM, Lin, Chen wrote: >>>>>>> I am curious to know if Pair allows multiple document level labels >>>>>>> for >>>>>> a single doc. It is possible that a single set of documents be used >>>>>> in multiple classification tasks. >>>>>>> For example, in one task a document may be labeled as "positive" or >>>>>> "negative", in another task this same doc may be labeled as "high", >>>>>> "moderate" or "low". Many thanks! >>>>>>> Best, >>>>>>> Chen >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Dmitriy Dligach [mailto:[email protected]] >>>>>>> Sent: Thursday, November 15, 2012 1:46 PM >>>>>>> To: [email protected] >>>>>>> Subject: Re: new type: document label? >>>>>>> >>>>>>> Thank you, James. >>>>>>> >>>>>>> So, in general did you envision this type of use for Pair: >>>>>>> >>>>>>> Pair.attribute -> "document_label" >>>>>>> Pair.value -> "positive" >>>>>>> >>>>>>> I think this may work. >>>>>>> >>>>>>> Dima >>>>>>> >>>>>>> On 11/15/2012 10:22 AM, Masanz, James J. wrote: >>>>>>>> Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for >>>>>> such document-level properties. >>>>>>>> Would that suit your need? >>>>>>>> >>>>>>>> -- James >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: >>>>>>>>> [email protected] >>>>>>>>> [mailto:ctakes-dev-return-854- >>>>>>>>> [email protected]] On Behalf Of Dmitriy >>>>>>>>> Dligach >>>>>>>>> Sent: Thursday, November 15, 2012 9:16 AM >>>>>>>>> To: cTAKES Dev list @ ASF >>>>>>>>> Subject: new type: document label? >>>>>>>>> >>>>>>>>> We've recently been using cTAKES more and more for document-level >>>>>>>>> classification (e.g. phenotyping). Would it make sense to add a >>>>>>>>> new type (that would derive from TOP) to store the label for a >>>> document? >>>>>>>>> I know we currently have a doc id for each document, but having >>>>>>>>> the label type would simplify a lot of things (e.g. debugging). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Dima >
