[ https://issues.apache.org/jira/browse/TIKA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Giuseppe Totaro updated TIKA-1654: ---------------------------------- Fix Version/s: 1.9 > Reset cTAKES CAS into CTAKESParser > ---------------------------------- > > Key: TIKA-1654 > URL: https://issues.apache.org/jira/browse/TIKA-1654 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Giuseppe Totaro > Assignee: Giuseppe Totaro > Labels: patch > Fix For: 1.9 > > > Using [CTAKESParser from Tika > Server|https://wiki.apache.org/tika/cTAKESParser], I noticed that an > exception occurs when the CTAKESParser is used multiple times: > {noformat} > org.apache.uima.cas.CASRuntimeException: Data for Sofa feature > setLocalSofaData() has already been set. > {noformat} > This is due to the CAS (Common Analysis System) used by CTAKESParser. The > CAS, as the AE (AnalysisEngine), is a static field into CTAKESParser to make > a sort of singleton. > By the way, An Analysis Engine is a cTAKES/UIMA component responsible for > analyzing unstructured information, discovering and representing semantic > content. An AnalysisEngine operates on an "analysis structure" (implemented > by CAS). > It is highly recommended to reuse the CAS, but it has to be reset before the > next run. The CTAKESUtils class ({{org.apache.tika.parser.ctakes}}) provides > the reset method to release all resources held by both AnalysisEngine and CAS > and then "destroy" them. This method prevents the CASRuntimeException error. > You can find in attachment the patch including two new methods (resetCAS and > resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine > respectively. > By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of > building them again for each run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)