Excellent!! I'll create a new RC. Thanks again, Karl
On Tue, Sep 25, 2018 at 8:13 AM Julien Massiera < [email protected]> wrote: > This new fix seems to work. Ingestions and deletions are working and the > image file with huge metadata is indexed ! > > Julien > > > On 25/09/2018 13:59, Karl Wright wrote: > > I've committed a hack to trunk. It has been tested for Solr Cell > > documents, deletions, and for tika-connector-extracted documents that > don't > > have a lot of metadata. I'm asking Julien to test it with his specific > > image that has lots of metadata to see if the pathway for that case works > > properly. If it does, I'll spin another RC. > > > > Long term, since I'm a Lucene/Solr committer, I think I'm going to have > to > > take SolrJ under my wing if we expect it to work for ManifoldCF. I don't > > have a lot of time to do stuff like this anymore but clearly neither does > > the Solr team. > > > > Karl > > > > > > On Tue, Sep 25, 2018 at 6:14 AM Karl Wright <[email protected]> wrote: > > > >> The back-and-forth is not going well. Mr. Noble is needing to be > >> convinced that it is a valid use case for Solr to have metadata longer > than > >> 4096 characters. In fact it seems like the Solr folks have deliberately > >> been trying to get rid of support for multipart posts for a while, > because > >> they don't see the need for them. I'm still hoping to convince them > >> otherwise but I'm not getting a positive feel. > >> > >> I'm still trying to figure out if multipart posts have any fundamental > >> conflict with their RequestWriter architecture. If not I can perhaps > >> override the RequestWrite implementation and add multipart support that > >> way. But it's not going to be a quick process by any means. > >> > >> > >> On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <[email protected]> > wrote: > >> > >>> Hi Julien, > >>> > >>> This has nothing to do with the new Tika. > >>> > >>> It is not normal; it means that UpdateRequests are not being sent as > >>> multipart form posts. It's going to require work from the Solr team > to fix > >>> this problem, however, because everything I do to work around the issue > >>> nonetheless seems to fail. :-( > >>> > >>> I'm having a back-and-forth with Paul Noble right now. I'll update > >>> accordingly when I know more. > >>> > >>> Karl > >>> > >>> > >>> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera < > >>> [email protected]> wrote: > >>> > >>>> After testing it, it is a +1 for me > >>>> > >>>> However, I found a new interesting issue coming with the new Tika > >>>> version. I had a jpg file for which some metadata were not extracted > >>>> before, like the RedTRC, BlueTRC and GreenTRC which contain > >>>> approximatively 2048 bytes of data each. As the metadata are passed to > >>>> Solr through the URI, I get the following error : URI is too large > >8192 > >>>> > >>>> Do we consider it as a "normal issue" or is it worth checking the > >>>> metadata length before sending the ingest request ? > >>>> > >>>> > >>>> On 24/09/2018 16:43, Karl Wright wrote: > >>>>> Please vote on whether to release ManifoldCF 2.11, RC3. This release > >>>>> contains a number of fixes/improvements/additions, described in the > >>>>> CHANGES.txt file. In addition, it includes Tika 1.19, which has a > >>>> number > >>>>> of fixes for classpath issues specifically requested by ManifoldCF. > >>>>> > >>>>> This completely fixes a SolrJ related problem with the Solr Connector > >>>> found > >>>>> in RC3. All tests pass. > >>>>> > >>>>> The release artifact can be found at: > >>>>> > >>>>> > >>>> > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11 > >>>>> There is also a tag at: > >>>>> > >>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3 > >>>>> > >>>>> Thanks again, > >>>>> Karl Wright > >>>>> > >>>> -- > >>>> Julien MASSIERA > >>>> Directeur développement produit > >>>> France Labs – Les experts du Search > >>>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington > DC > >>>> www.francelabs.com > >>>> > >>>> > > -- > Julien MASSIERA > Directeur développement produit > France Labs – Les experts du Search > Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC > www.francelabs.com > >
