Well, I think the downside to using Pair is that it is not self-documenting. In other words, everyone who did not see this thread will be faced with the same problem Dima and I were faced with (and have dealt with in different ways). Everyone trying to use the common type system outside of UIMA would be probably be completely lost.
Multiple labels... DocumentClass = List<Pair>? Or you could just instantiate multiple DocumentClass types, and not necessarily have the Document refer to them... stephen > I suggest using the current types. > > I think if we add a new one, we would still want to handle multiple > classifications, and would still have the downside of having to iterate > through the classifications to find the one of interest. So I'm not sure how > much we gain by adding a new type. > > But you are closer to this than I am so I would go with whatever you recommend > or others doing classification recommend. > > -- James > > >> -----Original Message----- >> From: [email protected] >> [mailto:ctakes-dev-return-874- >> [email protected]] On Behalf Of Dmitriy Dligach >> Sent: Thursday, November 15, 2012 3:49 PM >> To: [email protected] >> Subject: Re: new type: document label? >> >> James, thanks. This makes perfect sense. >> >> So what's the conclusion? Can we do with the current types, or do we >> still need to create a new one? >> >> Dima >> >> On 11/15/2012 03:43 PM, Masanz, James J. wrote: >>> Yes, you can put multiple Pair annotations in the CAS. >>> There is a Pairs (plural) annotation type which is a list (FSArray) of >> Pair annotations. >>> >>> You could have two Pair annotations with >>> attribute="at_risk_for_early_brca" >>> value="T" >>> >>> attribute="alchohol_use" >>> value="heavy_drinker" >>> >>> The downside: >>> You have to iteratate through the Pair annotations to find the one >> with the attribute name you want. >>> The upside: we don't have to create new Annotation types for >> everything that might be imagined. >>> >>> As Stephen points out, not everything in Pairs needs to be a document >>> class or related to the text within the document. It can be used for >> example to keep version information about a pipeline or anything any >> annotator wants. A totally made-up example could be >> attribute="dictionary_lookup_version" >>> value="3.2.1" >>> >>> -- James >>> >>> >>>> -----Original Message----- >>>> From: >>>> [email protected] >>>> [mailto:ctakes-dev-return-869- >>>> [email protected]] On Behalf Of Dmitriy >>>> Dligach >>>> Sent: Thursday, November 15, 2012 1:03 PM >>>> To: [email protected] >>>> Subject: Re: new type: document label? >>>> >>>> Chen brings up a good point. But can't we solve this problem by >>>> creating multiple Pair annotations in the CAS? >>>> >>>> Dima >>>> >>>> On 11/15/2012 01:52 PM, Lin, Chen wrote: >>>>> I am curious to know if Pair allows multiple document level labels >>>>> for >>>> a single doc. It is possible that a single set of documents be used >>>> in multiple classification tasks. >>>>> For example, in one task a document may be labeled as "positive" or >>>> "negative", in another task this same doc may be labeled as "high", >>>> "moderate" or "low". Many thanks! >>>>> Best, >>>>> Chen >>>>> >>>>> -----Original Message----- >>>>> From: Dmitriy Dligach [mailto:[email protected]] >>>>> Sent: Thursday, November 15, 2012 1:46 PM >>>>> To: [email protected] >>>>> Subject: Re: new type: document label? >>>>> >>>>> Thank you, James. >>>>> >>>>> So, in general did you envision this type of use for Pair: >>>>> >>>>> Pair.attribute -> "document_label" >>>>> Pair.value -> "positive" >>>>> >>>>> I think this may work. >>>>> >>>>> Dima >>>>> >>>>> On 11/15/2012 10:22 AM, Masanz, James J. wrote: >>>>>> Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended for >>>> such document-level properties. >>>>>> Would that suit your need? >>>>>> >>>>>> -- James >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: >>>>>>> [email protected] >>>>>>> [mailto:ctakes-dev-return-854- >>>>>>> [email protected]] On Behalf Of Dmitriy >>>>>>> Dligach >>>>>>> Sent: Thursday, November 15, 2012 9:16 AM >>>>>>> To: cTAKES Dev list @ ASF >>>>>>> Subject: new type: document label? >>>>>>> >>>>>>> We've recently been using cTAKES more and more for document-level >>>>>>> classification (e.g. phenotyping). Would it make sense to add a >>>>>>> new type (that would derive from TOP) to store the label for a >> document? >>>>>>> I know we currently have a doc id for each document, but having >>>>>>> the label type would simplify a lot of things (e.g. debugging). >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Dima >
