chibenwa opened a new pull request #902:
URL: https://github.com/apache/james-project/pull/902


   Tika was called from reactive code and was doing blocking HTTP calls from 
within
   the MIME parsing code.
   
   This generate:
    - An unneeded thread consumption as we have some threads waiting for Tika
      response
    - Potentially dangerous blocking calls: for instance the InVM event bus was
     doing such calls on the parallel thread pool (where it is critical NOT to
     block)...
    - Also the connection was opened on a per-call basis, not being reused.
   
    We introduce the following changes:
     - Reactification of the TextExtractor API
     - We re-implement the HTTP calls done by TikaTextExtractor with 
reactor-netty
     which allows us to pool HTTP connections and do this in a non-blocking
     reactive fashion.
     - We provide a reactive cache using the caffeine caching library - Guava
     caches are blocking thus not an option...
     - We uncouple the text extraction from the MIME parsing phase by 
introducing
     an intermediate POJO. Doing so requires us to do a post-parsing copy of
     content.
   
    Only do the copy if necessary. We don't want to copy large attachments for 
whom no text is going to be extracted...
   
     - Finally we reactify index content generation for ElasticSearch code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to