How about DocumentLabel or DocumentClassLabel? Sent from my iPhone
On Nov 20, 2012, at 11:01 AM, "Chen, Pei" <[email protected]> wrote: > If we do decide to create a new type, could we call it something like > DocumentClass or DocumentClassification and have some attribute(s) called > "label"? Otherwise we may need a WSD component to disambiguate "Label" from > general doc metadata... :) > > > > On Nov 19, 2012, at 11:25 AM, "Dmitriy Dligach" > <[email protected]> wrote: > >> This is a very good point. Also it seems like documents with multiple labels >> is not a scenario that we face every day, so why don't we just create a new >> type (e.g. DocumentLabel) that derives from TOP and use it for a while to >> see if it satisfies our document classification needs? >> >> Thanks, >> >> Dima >> >> On 11/16/2012 09:21 PM, Wu, Stephen T., Ph.D. wrote: >>> Well, I think the downside to using Pair is that it is not self-documenting. >>> In other words, everyone who did not see this thread will be faced with the >>> same problem Dima and I were faced with (and have dealt with in different >>> ways). Everyone trying to use the common type system outside of UIMA would >>> be probably be completely lost. >>> >>> Multiple labels... DocumentClass = List<Pair>? Or you could just >>> instantiate multiple DocumentClass types, and not necessarily have the >>> Document refer to them... >>> >>> stephen >>> >>> >>> >>>> I suggest using the current types. >>>> >>>> I think if we add a new one, we would still want to handle multiple >>>> classifications, and would still have the downside of having to iterate >>>> through the classifications to find the one of interest. So I'm not sure >>>> how >>>> much we gain by adding a new type. >>>> >>>> But you are closer to this than I am so I would go with whatever you >>>> recommend >>>> or others doing classification recommend. >>>> >>>> -- James >>>> >>>> >>>>> -----Original Message----- >>>>> From: [email protected] >>>>> [mailto:ctakes-dev-return-874- >>>>> [email protected]] On Behalf Of Dmitriy Dligach >>>>> Sent: Thursday, November 15, 2012 3:49 PM >>>>> To: [email protected] >>>>> Subject: Re: new type: document label? >>>>> >>>>> James, thanks. This makes perfect sense. >>>>> >>>>> So what's the conclusion? Can we do with the current types, or do we >>>>> still need to create a new one? >>>>> >>>>> Dima >>>>> >>>>> On 11/15/2012 03:43 PM, Masanz, James J. wrote: >>>>>> Yes, you can put multiple Pair annotations in the CAS. >>>>>> There is a Pairs (plural) annotation type which is a list (FSArray) of >>>>> Pair annotations. >>>>>> You could have two Pair annotations with >>>>>> attribute="at_risk_for_early_brca" >>>>>> value="T" >>>>>> >>>>>> attribute="alchohol_use" >>>>>> value="heavy_drinker" >>>>>> >>>>>> The downside: >>>>>> You have to iteratate through the Pair annotations to find the one >>>>> with the attribute name you want. >>>>>> The upside: we don't have to create new Annotation types for >>>>> everything that might be imagined. >>>>>> As Stephen points out, not everything in Pairs needs to be a document >>>>>> class or related to the text within the document. It can be used for >>>>> example to keep version information about a pipeline or anything any >>>>> annotator wants. A totally made-up example could be >>>>> attribute="dictionary_lookup_version" >>>>>> value="3.2.1" >>>>>> >>>>>> -- James >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: >>>>>>> [email protected] >>>>>>> [mailto:ctakes-dev-return-869- >>>>>>> [email protected]] On Behalf Of Dmitriy >>>>>>> Dligach >>>>>>> Sent: Thursday, November 15, 2012 1:03 PM >>>>>>> To: [email protected] >>>>>>> Subject: Re: new type: document label? >>>>>>> >>>>>>> Chen brings up a good point. But can't we solve this problem by >>>>>>> creating multiple Pair annotations in the CAS? >>>>>>> >>>>>>> Dima >>>>>>> >>>>>>> On 11/15/2012 01:52 PM, Lin, Chen wrote: >>>>>>>> I am curious to know if Pair allows multiple document level labels >>>>>>>> for >>>>>>> a single doc. It is possible that a single set of documents be used >>>>>>> in multiple classification tasks. >>>>>>>> For example, in one task a document may be labeled as "positive" or >>>>>>> "negative", in another task this same doc may be labeled as "high", >>>>>>> "moderate" or "low". Many thanks! >>>>>>>> Best, >>>>>>>> Chen >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Dmitriy Dligach [mailto:[email protected]] >>>>>>>> Sent: Thursday, November 15, 2012 1:46 PM >>>>>>>> To: [email protected] >>>>>>>> Subject: Re: new type: document label? >>>>>>>> >>>>>>>> Thank you, James. >>>>>>>> >>>>>>>> So, in general did you envision this type of use for Pair: >>>>>>>> >>>>>>>> Pair.attribute -> "document_label" >>>>>>>> Pair.value -> "positive" >>>>>>>> >>>>>>>> I think this may work. >>>>>>>> >>>>>>>> Dima >>>>>>>> >>>>>>>> On 11/15/2012 10:22 AM, Masanz, James J. wrote: >>>>>>>>> Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for >>>>>>> such document-level properties. >>>>>>>>> Would that suit your need? >>>>>>>>> >>>>>>>>> -- James >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: >>>>>>>>>> [email protected] >>>>>>>>>> [mailto:ctakes-dev-return-854- >>>>>>>>>> [email protected]] On Behalf Of Dmitriy >>>>>>>>>> Dligach >>>>>>>>>> Sent: Thursday, November 15, 2012 9:16 AM >>>>>>>>>> To: cTAKES Dev list @ ASF >>>>>>>>>> Subject: new type: document label? >>>>>>>>>> >>>>>>>>>> We've recently been using cTAKES more and more for document-level >>>>>>>>>> classification (e.g. phenotyping). Would it make sense to add a >>>>>>>>>> new type (that would derive from TOP) to store the label for a >>>>> document? >>>>>>>>>> I know we currently have a doc id for each document, but having >>>>>>>>>> the label type would simplify a lot of things (e.g. debugging). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Dima >>
