[
https://issues.apache.org/jira/browse/TIKA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592898#comment-14592898
]
Chris A. Mattmann commented on TIKA-1654:
-----------------------------------------
+1 please commit [~gostep] thanks!
> Reset cTAKES CAS into CTAKESParser
> ----------------------------------
>
> Key: TIKA-1654
> URL: https://issues.apache.org/jira/browse/TIKA-1654
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Giuseppe Totaro
> Assignee: Giuseppe Totaro
> Labels: patch
> Fix For: 1.10
>
> Attachments: TIKA-1654.patch, TIKA-1654.v02.patch
>
>
> Using [CTAKESParser from Tika
> Server|https://wiki.apache.org/tika/cTAKESParser], I noticed that an
> exception occurs when the CTAKESParser is used multiple times:
> {noformat}
> org.apache.uima.cas.CASRuntimeException: Data for Sofa feature
> setLocalSofaData() has already been set.
> {noformat}
> This is due to the CAS (Common Analysis System) used by CTAKESParser. The
> CAS, as the AE (AnalysisEngine), is a static field into CTAKESParser to make
> a sort of singleton.
> By the way, An Analysis Engine is a cTAKES/UIMA component responsible for
> analyzing unstructured information, discovering and representing semantic
> content. An AnalysisEngine operates on an "analysis structure" (implemented
> by CAS).
> It is highly recommended to reuse the CAS, but it has to be reset before the
> next run. The CTAKESUtils class ({{org.apache.tika.parser.ctakes}}) provides
> the reset method to release all resources held by both AnalysisEngine and CAS
> and then "destroy" them. This method prevents the CASRuntimeException error.
> You can find in attachment the patch including two new methods (resetCAS and
> resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine
> respectively.
> By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of
> building them again for each run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)