[
https://issues.apache.org/jira/browse/TIKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334819#comment-17334819
]
Tim Allison commented on TIKA-3370:
-----------------------------------
I pushed a bare minimum for this issue. I pretty much copied/pasted from the
ForkParser. Lots more to do. This forks the parsing per fetchemittuple into a
separate process, but brings back all the emit data into the primary process so
that they can be batched for emitting. This is a memory risk and needs to be
fixed somehow...
> Refactor the AsyncProcessor in 2.x
> ----------------------------------
>
> Key: TIKA-3370
> URL: https://issues.apache.org/jira/browse/TIKA-3370
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Major
>
> Yesterday, I finally got back to trying to wire the AsyncProcessor in
> tika-pipes into the AsyncHandler in tika-server. I've now convinced myself
> that the notorious antipattern of using a db as a queue is in fact a really,
> really bad idea -- there's every chance that I wasn't doing it right or that
> H2 isn't a great choice...my $ is on the former.
> Nevertheless, I think removing H2 from that process and going with a
> modification of our ForkParser or a lightweight purpose-built knock-off to
> handle fetchers and emitters will be as robust, a bunch cleaner, have fewer
> dependencies and hopefully be more performant than what I had in the
> AsyncProcessor.
> Immediate term, I'd like to get this running and wired into tika-server.
> Longer term, we can use this instead of tika-batch in tika-app...more use,
> fewer bugs.
> This is the last item I'd like to finish before 2.0.0-BETA.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)