Giuseppe Totaro created TIKA-1654:
-------------------------------------

             Summary: Reset cTAKES CAS into CTAKESParser
                 Key: TIKA-1654
                 URL: https://issues.apache.org/jira/browse/TIKA-1654
             Project: Tika
          Issue Type: Bug
          Components: parser
            Reporter: Giuseppe Totaro
            Assignee: Giuseppe Totaro


Using [CTAKESParser from Tika 
Server|https://wiki.apache.org/tika/cTAKESParser], I noticed that an exception 
occurs when the CTAKESParser is used multiple times:

{noformat}
org.apache.uima.cas.CASRuntimeException: Data for Sofa feature 
setLocalSofaData() has already been set.
{noformat}

This is due to the CAS (Common Analysis System) used by CTAKESParser. The CAS, 
as the AE (AnalysisEngine), is a static field into CTAKESParser to make a sort 
of singleton.

By the way, An Analysis Engine is a cTAKES/UIMA component responsible for 
analyzing unstructured information, discovering and representing semantic 
content. An AnalysisEngine operates on an "analysis structure" (implemented by 
CAS).

It is highly recommended to reuse the CAS, but it has to be reset before the 
next run. The CTAKESUtils class ({{org.apache.tika.parser.ctakes}}) provides 
the reset method to release all resources held by both AnalysisEngine and CAS 
and then "destroy" them. This method prevents the CASRuntimeException error.

You can find in attachment the patch including two new methods (resetCAS and 
resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine 
respectively.
By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of 
building them again for each run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to