Ken, Once TIKA-2653 is done and 1.19(?) is released, I'll propose switching ERH to the ForkParser. There's also an open ticket for using tika-server. I think users should have both options.
On Tue, May 29, 2018 at 3:25 PM, Tim Allison <talli...@apache.org> wrote: > 1: CORRECTION: the ForkParser by itself (without my mods) will protect > against ooms, permanent hangs, and native lib crashing. My proposed mods (on > TIKA-2653) only move the parser dependencies out of Solr's dependencies. > > 2: note: Also, note the discussion on where to place this information. > Cassandra Targett advocates putting this guidance in the main users' guide. > > On Tue, May 29, 2018 at 3:22 PM, Tim Allison <talli...@apache.org> wrote: > >> Y, my mods to the ForkParser should make it more robust, and will help >> with OOMs, permanent hangs and native lib crashing. But those changes are >> still in the works... >> >> On Tue, May 29, 2018 at 3:18 PM, Luís Filipe Nassif <lfcnas...@gmail.com> >> wrote: >> >>> Hi Ken, >>> >>> Threads will not help with OutOfMemoryErrors or crashes caused by native >>> libs. ForkParser can help, after the refactoring started by Tim to handle >>> some of its limitations. See TIKA-2653 >>> >>> 2018-05-29 16:11 GMT-03:00 Ken Krugler <kkrugler_li...@transpac.com>: >>> >>> > Thanks for the ref, Tim. >>> > >>> > I’m curious why SolrCell doesn’t fire up threads when parsing docs with >>> > Tika (or use the fork parser), to mitigate issues with hangs & crashes? >>> > >>> > — Ken >>> > >>> > > On May 29, 2018, at 11:54 AM, Tim Allison <talli...@apache.org> >>> wrote: >>> > > >>> > > All, >>> > > >>> > > Over the weekend, Shawn Heisey very kindly drafted a wikipage about >>> the >>> > > challenges of using Solr's ExtractingRequestHandler and the guidance >>> to >>> > > avoid it in production. >>> > > >>> > > I completely agree with this point, and I think that Shawn did a >>> very >>> > > nice job of capturing some of the challenges. If you have any >>> feedback >>> > or >>> > > would like to make edits, see: >>> > > >>> > > https://wiki.apache.org/solr/RecommendCustomIndexingWithTika >>> > > >>> > > Cheers, >>> > > >>> > > Tim >>> > >>> > -------------------------------------------- >>> > http://about.me/kkrugler >>> > +1 530-210-6378 >>> > >>> > >>> >> >> >