[ 
https://issues.apache.org/jira/browse/TIKA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Totaro updated TIKA-1654:
----------------------------------
    Fix Version/s: 1.9

> Reset cTAKES CAS into CTAKESParser
> ----------------------------------
>
>                 Key: TIKA-1654
>                 URL: https://issues.apache.org/jira/browse/TIKA-1654
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Giuseppe Totaro
>            Assignee: Giuseppe Totaro
>              Labels: patch
>             Fix For: 1.9
>
>
> Using [CTAKESParser from Tika 
> Server|https://wiki.apache.org/tika/cTAKESParser], I noticed that an 
> exception occurs when the CTAKESParser is used multiple times:
> {noformat}
> org.apache.uima.cas.CASRuntimeException: Data for Sofa feature 
> setLocalSofaData() has already been set.
> {noformat}
> This is due to the CAS (Common Analysis System) used by CTAKESParser. The 
> CAS, as the AE (AnalysisEngine), is a static field into CTAKESParser to make 
> a sort of singleton.
> By the way, An Analysis Engine is a cTAKES/UIMA component responsible for 
> analyzing unstructured information, discovering and representing semantic 
> content. An AnalysisEngine operates on an "analysis structure" (implemented 
> by CAS).
> It is highly recommended to reuse the CAS, but it has to be reset before the 
> next run. The CTAKESUtils class ({{org.apache.tika.parser.ctakes}}) provides 
> the reset method to release all resources held by both AnalysisEngine and CAS 
> and then "destroy" them. This method prevents the CASRuntimeException error.
> You can find in attachment the patch including two new methods (resetCAS and 
> resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine 
> respectively.
> By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of 
> building them again for each run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to