Holy cattle, it worked ?!? I don't know of a specific xcas reader offhand ... have you tried running with the xmi reader? Some of the reads laying around will handle both.
-----Original Message----- From: Tomasz Oliwa [mailto:[email protected]] Sent: Thursday, November 19, 2015 6:48 PM To: [email protected] Subject: RE: TermConsumers Sean, I tested this, the Annotator itself works, great. The only change I had to do when writing the Annotator class with the code below is to provide generics in: static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.<Class<? extends IdentifiedAnnotation>>asList( MedicationMention.class, DiseaseDisorderMention.class, SignSymptomMention.class, LabMention.class, ProcedureMention.class ); At least on a small example XMI CAS I see the behavior is as expected for the IdentifiedAnnotations. However, for my usecase, I have XCAS files, not XMI CAS files. I can use XCasWriterCasConsumer to write the CAS files, but I cannot find any XCAS Collection Reader to initially read them in. Is such a reader available? Regards, Tomasz ________________________________________ From: Finan, Sean [[email protected]] Sent: Thursday, November 19, 2015 4:03 PM To: [email protected] Subject: RE: TermConsumers Hi Tomasz, I don't know that anybody has done this. However, you could try running a pipeline with items in ctakes-core: XmiCollectionReaderCtakes to read your existing cas xmi files in directory -- custom refiner AE below -- to remove unwanted umls annotations XmiWriterCasConsumerCtakes to write the new cas xmi files The refiner AE would basically do what the PrecisionTermConsumer of the fast lookup does, but over a pre-populated cas. This is mostly cut and paste from other code with a little bit of lookompiling - I haven't tested it at all! If you do give it a run-through and it works then let me know and I'll clean it up and check into sandbox. static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.asList( MedicationMention.class, DiseaseDisorderMention.class, SignSymptomMention.class, LabMention.class, ProcedureMention.class ); // Don't forget AnatomicalSiteMention.class and generic EntityMention.class! static private final Function<Annotation,TextSpan> createTextSpan = annotation -> new DefaultTextSpan( annotation.getBegin(), annotation.getEnd() ); static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> returnSelf = annotation -> annotation; @Override public void process( final JCas jcas ) throws AnalysisEngineProcessException { LOGGER.info( "Starting processing" ); for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) { refineForClass( jcas, eventClass ); } final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( jcas, AnatomicalSiteMention.class ); final Collection<EntityMention> entityMentions = new ArrayList<>( JCasUtil.select( jcas, EntityMention.class ) ); entityMentions.removeAll( anatomicals ); refineForAnnotations( jcas, anatomicals ); refineForAnnotations( jcas, entityMentions ); LOGGER.info( "Finished processing" ); } static private <T extends IdentifiedAnnotation> void refineForClass( final JCas jcas, final Class<T> eventClass ) { refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) ); } static private <T extends IdentifiedAnnotation> void refineForAnnotations( final JCas jcas, final Collection<T> annotations ) { final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans = annotations.stream().collect( Collectors.toMap( createTextSpan, returnSelf ) ); final Collection<TextSpan> unwantedSpans = getUnwantedSpans( annotationTextSpans.keySet() ); unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> t.removeFromIndexes( jcas ) ); } static private Collection<TextSpan> getUnwantedSpans( final Collection<TextSpan> originalTextSpans ) { final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans ); final Collection<TextSpan> discardSpans = new HashSet<>(); final int count = textSpans.size(); for ( int i = 0; i < count; i++ ) { final TextSpan spanKeyI = textSpans.get( i ); for ( int j = i + 1; j < count; j++ ) { final TextSpan spanKeyJ = textSpans.get( j ); if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && spanKeyJ.getEnd() > spanKeyI.getEnd()) || (spanKeyJ.getBegin() < spanKeyI.getBegin() && spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) { // J contains I, discard less precise concepts for span I and move on to next span I discardSpans.add( spanKeyI ); break; } if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && spanKeyI.getEnd() > spanKeyJ.getEnd()) || (spanKeyI.getBegin() < spanKeyJ.getBegin() && spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) { // I contains J, discard less precise concepts for span J and move on to next span J discardSpans.add( spanKeyJ ); } } } return discardSpans; } Good luck, Sean -----Original Message----- From: Tomasz Oliwa [mailto:[email protected]] Sent: Thursday, November 19, 2015 12:08 PM To: [email protected] Subject: TermConsumers Hi, How can I run a different TermConsumer on already generated CAS files? I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the DefaultTermConsumer set in cTakesHsql.xml. Now I would like to apply the PrecisionTermConsumer on these CAS files without having to do the whole annotation process again. The IdentifiedAnnotations are all there, it is only a matter of removing them according to the TermConsumers logic. Is there a way to create a passthrough Processor that simply reads the CAS, applies a different TermConsumer and writes it to disk? Or is there a different way to go on about this? Thanks for any help, Tomasz
