[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

Karl Wright (JIRA) Tue, 25 Sep 2018 08:47:08 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627547#comment-16627547
 ]


Karl Wright commented on SOLR-12798:
------------------------------------

[~noble.paul] 'We are assuming your usecase can only be implemented using a 
multipart request. Can we see what do you send in the request parameters?'

That's kind of a silly question if you don't mind me saying so.  MCF is a 
framework with dozens of connectors for accessing different kinds of document 
repositories.  A "document" in ManifoldCF consists of:

- A content stream of infinite length
- Unlimited metadata, in the form of name/valuelist pairs

Documents that have large amounts of metadata are common.  The details vary 
considerably by source repository.  For only one example, we have one client 
who seemingly specializes in indexing image content.  The images are run 
through Tika, which takes these images and produces a zero-length text file and 
sometimes 100K bytes of metadata text, in multiple metadata fields.

I hope that's enough to demonstrate why it is impossible to expect all the 
metadata for a document to fit in the URL.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-12798
>                 URL: https://issues.apache.org/jira/browse/SOLR-12798
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 7.4
>            Reporter: Karl Wright
>            Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

Reply via email to