[
https://issues.apache.org/jira/browse/TIKA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Giuseppe Totaro updated TIKA-1654:
----------------------------------
Attachment: TIKA-1654.v02.patch
[~chrismattmann] has reported to me that sometimes CTAKESParser throws an
exception when is used from tika-server. This exception is caused by the
following:
{noformat}
CASAdminException: Can't flush CAS, flushing is disabled.
{noformat}
Looking at the stack trace of the exception, the problem should be due to
some synchronization issues while accessing the CAS (Common Analysis System)
via CTAKESParser in case of multple "simultaneous" requests to the server.
I have written a new patch because even the first version of CTAKESParser
could cause the same problem, if my assumption is correct. You can find
the new patch in attachment.
Essentially, the patch makes private both AE and CAS for each instance of
CTAKESContentHandler.
I am also writing the new version of CTAKESParser to call cTAKES as an external
command. However, if you have any suggestion in order to provide some
synchronization mechanism to access both AE and CAS, it is really appreciated.
Thanks [~chrismattmann] for supporting me on this work.
> Reset cTAKES CAS into CTAKESParser
> ----------------------------------
>
> Key: TIKA-1654
> URL: https://issues.apache.org/jira/browse/TIKA-1654
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Giuseppe Totaro
> Assignee: Giuseppe Totaro
> Labels: patch
> Fix For: 1.10
>
> Attachments: TIKA-1654.patch, TIKA-1654.v02.patch
>
>
> Using [CTAKESParser from Tika
> Server|https://wiki.apache.org/tika/cTAKESParser], I noticed that an
> exception occurs when the CTAKESParser is used multiple times:
> {noformat}
> org.apache.uima.cas.CASRuntimeException: Data for Sofa feature
> setLocalSofaData() has already been set.
> {noformat}
> This is due to the CAS (Common Analysis System) used by CTAKESParser. The
> CAS, as the AE (AnalysisEngine), is a static field into CTAKESParser to make
> a sort of singleton.
> By the way, An Analysis Engine is a cTAKES/UIMA component responsible for
> analyzing unstructured information, discovering and representing semantic
> content. An AnalysisEngine operates on an "analysis structure" (implemented
> by CAS).
> It is highly recommended to reuse the CAS, but it has to be reset before the
> next run. The CTAKESUtils class ({{org.apache.tika.parser.ctakes}}) provides
> the reset method to release all resources held by both AnalysisEngine and CAS
> and then "destroy" them. This method prevents the CASRuntimeException error.
> You can find in attachment the patch including two new methods (resetCAS and
> resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine
> respectively.
> By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of
> building them again for each run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)