Hi Tim, Could you clarify the pros and cons between ForkParser (after your refactoring) and TikaServer? Maybe we should send those to users list and wiki...
Thanks 2018-05-29 16:27 GMT-03:00 Tim Allison <talli...@apache.org>: > Ken, > Once TIKA-2653 is done and 1.19(?) is released, I'll propose switching > ERH to the ForkParser. There's also an open ticket for using tika-server. > I think users should have both options. > > On Tue, May 29, 2018 at 3:25 PM, Tim Allison <talli...@apache.org> wrote: > >> 1: CORRECTION: the ForkParser by itself (without my mods) will protect >> against ooms, permanent hangs, and native lib crashing. My proposed mods (on >> TIKA-2653) only move the parser dependencies out of Solr's dependencies. >> >> 2: note: Also, note the discussion on where to place this information. >> Cassandra Targett advocates putting this guidance in the main users' guide. >> >> On Tue, May 29, 2018 at 3:22 PM, Tim Allison <talli...@apache.org> wrote: >> >>> Y, my mods to the ForkParser should make it more robust, and will help >>> with OOMs, permanent hangs and native lib crashing. But those changes are >>> still in the works... >>> >>> On Tue, May 29, 2018 at 3:18 PM, Luís Filipe Nassif <lfcnas...@gmail.com >>> > wrote: >>> >>>> Hi Ken, >>>> >>>> Threads will not help with OutOfMemoryErrors or crashes caused by native >>>> libs. ForkParser can help, after the refactoring started by Tim to >>>> handle >>>> some of its limitations. See TIKA-2653 >>>> >>>> 2018-05-29 16:11 GMT-03:00 Ken Krugler <kkrugler_li...@transpac.com>: >>>> >>>> > Thanks for the ref, Tim. >>>> > >>>> > I’m curious why SolrCell doesn’t fire up threads when parsing docs >>>> with >>>> > Tika (or use the fork parser), to mitigate issues with hangs & >>>> crashes? >>>> > >>>> > — Ken >>>> > >>>> > > On May 29, 2018, at 11:54 AM, Tim Allison <talli...@apache.org> >>>> wrote: >>>> > > >>>> > > All, >>>> > > >>>> > > Over the weekend, Shawn Heisey very kindly drafted a wikipage >>>> about the >>>> > > challenges of using Solr's ExtractingRequestHandler and the >>>> guidance to >>>> > > avoid it in production. >>>> > > >>>> > > I completely agree with this point, and I think that Shawn did a >>>> very >>>> > > nice job of capturing some of the challenges. If you have any >>>> feedback >>>> > or >>>> > > would like to make edits, see: >>>> > > >>>> > > https://wiki.apache.org/solr/RecommendCustomIndexingWithTika >>>> > > >>>> > > Cheers, >>>> > > >>>> > > Tim >>>> > >>>> > -------------------------------------------- >>>> > http://about.me/kkrugler >>>> > +1 530-210-6378 >>>> > >>>> > >>>> >>> >>> >> >