Giuseppe Totaro created TIKA-1654:
-------------------------------------
Summary: Reset cTAKES CAS into CTAKESParser
Key: TIKA-1654
URL: https://issues.apache.org/jira/browse/TIKA-1654
Project: Tika
Issue Type: Bug
Components: parser
Reporter: Giuseppe Totaro
Assignee: Giuseppe Totaro
Using [CTAKESParser from Tika
Server|https://wiki.apache.org/tika/cTAKESParser], I noticed that an exception
occurs when the CTAKESParser is used multiple times:
{noformat}
org.apache.uima.cas.CASRuntimeException: Data for Sofa feature
setLocalSofaData() has already been set.
{noformat}
This is due to the CAS (Common Analysis System) used by CTAKESParser. The CAS,
as the AE (AnalysisEngine), is a static field into CTAKESParser to make a sort
of singleton.
By the way, An Analysis Engine is a cTAKES/UIMA component responsible for
analyzing unstructured information, discovering and representing semantic
content. An AnalysisEngine operates on an "analysis structure" (implemented by
CAS).
It is highly recommended to reuse the CAS, but it has to be reset before the
next run. The CTAKESUtils class ({{org.apache.tika.parser.ctakes}}) provides
the reset method to release all resources held by both AnalysisEngine and CAS
and then "destroy" them. This method prevents the CASRuntimeException error.
You can find in attachment the patch including two new methods (resetCAS and
resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine
respectively.
By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of
building them again for each run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)