[
https://issues.apache.org/jira/browse/TIKA-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018931#comment-14018931
]
Tim Allison commented on TIKA-1323:
-----------------------------------
Hi Sergey,
For TIKA-1302, I'd like to use tika-server, and I'd like to be able to record
exceptions at a per file level so that we can say, e.g. With Tika 1.5 we had
515 exceptions on docx files, but with Tika-1.6-SNAPSHOT we had 1025 or
something similar. I'd also like to be able to say: we had an exception on
file 12345.docx with Tika 1.5 but we're not getting an exception with
Tika-1.6-SNAPSHOT. We can do that now with tika-server on the client side. If
my client receives a 422 or 500, I know that something went wrong, and I can
log it.
However, what I'd also like to be able to do is identify frequency of
stacktrace elements so that we can sort the most frequent exceptions per
document type. To do this, we need to be able to record the stacktrace, and
I'd also like to be able to link the stacktrace back to the document that
caused the problem.
If I run Tika directly via java code (what I've been doing), I can easily catch
the exceptions and log the information at a per file basis. So, my preference
(plan A) would be have tika-server return the stacktrace as the body content
for exceptions. We can parameterize this functionality on the commandline, of
course. The other option (plan B) would be to pass the file name to
tika-server, and have tika-server log the file name in conjunction with the
stacktrace, but that is not as appealing to me. The third option, of course,
is to set up a different service for evaluation, but I'd much prefer to use our
base code as much as possible.
So, is plan A reasonable?
> Improve logging in JAX-RS server
> --------------------------------
>
> Key: TIKA-1323
> URL: https://issues.apache.org/jira/browse/TIKA-1323
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
>
> I'd like to use tika-server for TIKA-1302. As part of that, I'd like to
> record exception stacktraces per document. I see two options: transmit the
> info back to the client (assuming a doc didn't bring the server down :) )
> along with the current error code or log the document id and stacktrace via
> the server. Given my current design thoughts, I'd prefer the first option.
> Any objections or recommendations?
--
This message was sent by Atlassian JIRA
(v6.2#6252)