Hi Karl, Many thanks.
I found the configuration to use: Here http://www.francelabs.com/blog/tutorial-for-combining-manifoldcf-and-solr-for-files-search/ Search for "ignoreTikaException" I'll test it and see if it fixes my issue. Fred -----Message d'origine----- De : Karl Wright [mailto:[email protected]] Envoyé : mercredi 21 octobre 2015 17:23 À : dev Objet : Re: [Solr] Error on documents makes ManifoldCF Standard google searching finds it. See: http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201503.mbox/%[email protected]%3E Karl On Wed, Oct 21, 2015 at 11:14 AM, Frédéric Olier <[email protected]> wrote: > Hi, > > Thanks for your reply. > > I looked here : > http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/ > > But there is no 'search' option... > > Any idea where I can search what I'm looking for more efficiently ? > > Thanks > > > -----Message d'origine----- > De : Karl Wright [mailto:[email protected]] Envoyé : mercredi 21 > octobre 2015 16:47 À : dev Objet : Re: [Solr] Error on documents makes > ManifoldCF > > Hi Frédéric, > > There's a flag in the Solr configuration you can set that will cause > exceptions from Solr Cell (Tika) to cause the document to be skipped > rather than causing ManifoldCF to retry the document. I don't > remember what it is but others have noted it and you can search the mail > archive to find it. > > Thanks, > Karl > > > On Wed, Oct 21, 2015 at 10:29 AM, Frédéric Olier <[email protected]> wrote: > > > Hi, > > > > > > > > We integrated Solr to ManifoldCF. > > > > We configured Solr to use the OCR engine. > > > > > > > > When we crawl documents MCF reads the docs fine and submit them to Solr. > > > > > > > > It happens on large files (PDF, images) that the OCR takes too long > > which leads to MCF request to fail. > > > > > > > > The annoying thing is that MCF does not ignore the file. > > > > On the next crawling, the file keeps failing. > > > > > > > > How could I tell manifold to skip the file that fails ? > > > > > > > > Thanks for your reply. > > > > > > > > [image: TOP 250 des éditeurs] > > <http://miblink.letsignit.com/r/3808/0a67e322-f9f6-4d7b-89bb-46f2830 > > 87 > > b34/undefined> > > > > [image: Logo] > > <http://miblink.letsignit.com/r/1794/1a6d2119-9a4e-4a6d-ba13-8730eac > > 1b > > 836/undefined> > > > > *Suivez-nous !* > > > > [image: Linkedin] > > <http://miblink.letsignit.com/r/1795/28939672-253e-4233-8ba0-9b8738a > > fa > > 52f/undefined> > > > > [image: Viadeo] > > <http://miblink.letsignit.com/r/1796/41a2cad7-8cc0-4a99-91f0-dec6f46 > > 3f > > e83/undefined> > > > > [image: Twitter] > > <http://miblink.letsignit.com/r/1797/7a7a83af-ce3e-4d9e-83fa-aeb9d3b > > 26 > > d01/undefined> > > > > [image: Googleplus] > > <http://miblink.letsignit.com/r/2870/20ae85fe-1e5f-4e23-b3f8-365a199 > > 76 > > f79/undefined> > > > > *Frédéric OLIER** | Responsable de la planification stratégique* > > > > * 33 442 016 891 33 662 635 031* > > > > *WOOXO* > > Tél : 0811 140 160 > > Fax0811 481 507 > > Immeuble Le Forum - Bât A - 3ème étage > > 515 av. de la Tramontane > > ZAC Athélia IV > > 13600 LA CIOTAT > > FRANCE > > > > > > > > > > >
