Hi Raymond, ManifoldCF specifies multipart post when using HttpSolrServer directly. It cannot specify multipart post, however, for CloudSolrServer, which is part of the problem. See my earlier post.
Karl On Mon, Dec 16, 2013 at 7:30 AM, Raymond Wiker <[email protected]> wrote: > BTW: CONNECTORS-609 may have the same underlying cause (the stack trace in > the email archive shows the error message as "FULL HEAD"). > > > On Mon, Dec 16, 2013 at 1:26 PM, Raymond Wiker <[email protected]> wrote: > > > Is the useMultiPartPost=false in ManifoldCF, or in SolrJ? > > > > > > On Mon, Dec 16, 2013 at 1:18 PM, Alessandro Benedetti < > > [email protected]> wrote: > > > >> I have more details now, after a deep debugging : > >> > >> The CloudSolrServer triggers the LBHttpSolrServer > >> lbServer.request(lbRequest).getResponse(). > >> > >> The LBHttpSolrServer triggers the HttpSolrServer request(request). > >> > >> It's here that we build the httpPOST in this way : > >> > >> boolean isMultipart = (this.useMultiPartPost || ( streams != null && > >> streams.size() > 1 )) && !hasNullStreamName; > >> > >> LinkedList<NameValuePair> postParams = new > >> LinkedList<NameValuePair>(); > >> ... > >> List<FormBodyPart> parts = new LinkedList<FormBodyPart>(); > >> Iterator<String> iter = > params.getParameterNamesIterator(); > >> while (iter.hasNext()) { > >> String p = iter.next(); > >> String[] vals = params.getParams(p); > >> if (vals != null) { > >> for (String v : vals) { > >> if (isMultipart) { *// IMPORTANT* > >> parts.add(new FormBodyPart(p, new StringBody(v, > >> Charset.forName("UTF-8")))); > >> } else { > >> postParams.add(new BasicNameValuePair(p, v)); > >> } > >> } > >> } > >> } > >> ... > >> } > >> * // It is has one stream, it is the post body, put the > params > >> in the URL* > >> * else { // we finish in this case* > >> String pstr = ClientUtils.toQueryString(params, false); > >> HttpPost post = new HttpPost(url + pstr); > >> > >> I checked that debugging Manifold the CloudSolrServer calls a > >> LBHttpSolrServer that calls a HttpSolrServer with > useMultiPartPost=false . > >> Here we are with the problem. > >> So at the moment we have evidence that the metadata field values are > >> placed > >> in the http header. > >> > >> Now, what's behind that ? A bug ? A decision to not use multiPartPost ? > >> Any advice ? > >> > >> > >> > >> 2013/12/16 Raymond Wiker <[email protected]> > >> > >> > That looks distinctly odd: you have an HTTP POST request, but the > >> > parameters are attached to the url, GET-style. It really makes no > sense > >> to > >> > add parameters to the url when you have to use POST to carry the file > >> > content --- but in the "simple post tool", that is exactly what they > >> do. My > >> > best guess is that they do it this way to avoid having to deal with > the > >> > complexities of multipart/form-data, and this might be acceptable in a > >> > scenario where the number of parameters is so small that you run no > >> risk of > >> > overrunning the header size limit. > >> > > >> > It's possible that the SolrJ developers make the assumption that this > is > >> > safe; alternatively (and hopefully) there is a way of instructing > SolrJ > >> to > >> > place all the parameters in the request body. If the first is the > case, > >> > you'll have to find a workaround (for example, increasing the maximum > >> > header size in Jetty); In the second case, I guess that ManifoldCF > >> needs to > >> > setup SolrJ appropriately. > >> > > >> > > >> > > >> > On Mon, Dec 16, 2013 at 11:53 AM, Alessandro Benedetti < > >> > [email protected]> wrote: > >> > > >> > > There was an error in the previous mail, and some of the content is > >> > quoted > >> > > and maybe not clear at a first glance, I report the most important > >> part > >> > of > >> > > the mail here : > >> > > > >> > > You can see that all the params are appended to the URL,so they will > >> go > >> > in > >> > > the Headers of the Http POST request, here you are : > >> > > > >> > > POST /solr/collection1/update/extract?literal.id > >> > > =C+Movies%3A1025&literal.field2=value2&....&literal.fieldN=valueN& > >> > > resource.name=Tom+Cruise&wt=javabin&version=2 > >> > > > >> > > User-Agent Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] > 1.0 > >> > > Transfer-Encoding chunked > >> > > Content-Type text/plain > >> > > Host 10.0.1.16:8983 > >> > > Request Header Size : 5.99 KB (6133 bytes) > >> > > > >> > > Remember that is not my code, but Manifold 1.4.1 out of the box : > >> > > > >> > > org.apache.manifoldcf.agents.output.solr.HttpPoster > >> > > > >> > > writeField(out,LITERAL+newFieldName,values); > >> > > // Write the commitWithin parameter > >> > > if (commitWithin != null) > >> > > writeField(out,COMMITWITHIN_METADATA,commitWithin); > >> > > contentStreamUpdateRequest.setParams(out); > >> > > contentStreamUpdateRequest.addContentStream(new > >> > > RepositoryDocumentStream(is,length,contentType,contentName)); > >> > > contentStreamUpdateRequest.process(solrServer) > >> > > > >> > > Cheers > >> > > > >> > > > >> > > 2013/12/16 Alessandro Benedetti <[email protected]> > >> > > > >> > > > 2013/12/16 Raymond Wiker <[email protected]> > >> > > > > >> > > >> On Mon, Dec 16, 2013 at 9:42 AM, Alessandro Benedetti < > >> > > >> [email protected]> wrote: > >> > > >> > >> > > > > >> > > >> > Do you have any means of capturing the entire http (POST) > >> request? > >> > It > >> > > >> > could > >> > > >> > > be that SolrJ is adding things to the header. > >> > > >> > > >> > > >> > I used Fiddler and Charles ( 2 softwares for monitoring http > >> > > requests). > >> > > >> All > >> > > >> > the params added to the ContentStreamUpdateRequest appear to be > >> in > >> > the > >> > > >> > header. > >> > > >> > Nothing else added by SolrJ. > >> > > >> > > >> > > >> > >> > > >> Ok. Would it be possible for you to generate a set of captures > that > >> > > could > >> > > >> be shared? I'd be happy to take a look. > >> > > >> > >> > > > > >> > > > Absolutely yes,you can see that all the params are appended to the > >> > URL,so > >> > > > they will go in the Headers of the Http POST request, here you are > >> : > >> > > > > >> > > > POST /solr/collection1/update/extract?literal.id > >> > > > =C+Movies%3A1025&literal.field2=value2&....&literal.fieldN=valueN& > >> > > > resource.name=Tom+Cruise&wt=javabin&version=2 > >> > > > > >> > > > User-Agent Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] > >> 1.0 > >> > > > Transfer-Encoding chunked > >> > > > Content-Type text/plain > >> > > > Host 10.0.1.16:8983 > >> > > > Request Header Size : 5.99 KB (6133 bytes) > >> > > > > >> > > > Remember that is not my code, but Manifold 1.4.1 out of the box : > >> > > > > >> > > > org.apache.manifoldcf.agents.output.solr.HttpPoster > >> > > > > >> > > > writeField(out,LITERAL+newFieldName,values); > >> > > > // Write the commitWithin parameter > >> > > > if (commitWithin != null) > >> > > > writeField(out,COMMITWITHIN_METADATA,commitWithin); > >> > > > contentStreamUpdateRequest.setParams(out); > >> > > > contentStreamUpdateRequest.addContentStream(new > >> > > > RepositoryDocumentStream(is,length,contentType,contentName)); > >> > > > contentStreamUpdateRequest.process(solrServer) > >> > > > > >> > > > > >> > > > > >> > > >> > >> > > >> > > > >> > > >> > > What container are you running Solr under? Are you accessing > >> Solr > >> > > >> > directly, > >> > > >> > > or via a proxy? > >> > > >> > > >> > > >> > Direct access through a SolrCloudServer configured on a > zookeper > >> > > >> ensemble > >> > > >> > of 3 zk. > >> > > >> > Solr are running on Jetty. > >> > > >> > > >> > > >> > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > -------------------------- > >> > > > > >> > > > Benedetti Alessandro > >> > > > Visiting card : http://about.me/alessandro_benedetti > >> > > > > >> > > > "Tyger, tyger burning bright > >> > > > In the forests of the night, > >> > > > What immortal hand or eye > >> > > > Could frame thy fearful symmetry?" > >> > > > > >> > > > William Blake - Songs of Experience -1794 England > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > -------------------------- > >> > > > > >> > > > Benedetti Alessandro > >> > > > Visiting card : http://about.me/alessandro_benedetti > >> > > > > >> > > > "Tyger, tyger burning bright > >> > > > In the forests of the night, > >> > > > What immortal hand or eye > >> > > > Could frame thy fearful symmetry?" > >> > > > > >> > > > William Blake - Songs of Experience -1794 England > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > -------------------------- > >> > > > >> > > Benedetti Alessandro > >> > > Visiting card : http://about.me/alessandro_benedetti > >> > > > >> > > "Tyger, tyger burning bright > >> > > In the forests of the night, > >> > > What immortal hand or eye > >> > > Could frame thy fearful symmetry?" > >> > > > >> > > William Blake - Songs of Experience -1794 England > >> > > > >> > > >> > >> > >> > >> -- > >> -------------------------- > >> > >> Benedetti Alessandro > >> Visiting card : http://about.me/alessandro_benedetti > >> > >> "Tyger, tyger burning bright > >> In the forests of the night, > >> What immortal hand or eye > >> Could frame thy fearful symmetry?" > >> > >> William Blake - Songs of Experience -1794 England > >> > > > > >
