The trace indicates that the commit operation is failing with a non-XML response, probably a 500 error with HTML. You can see exactly what came back by using the "Simple History" report; it should all be there.
Karl On Mon, Mar 14, 2011 at 10:34 PM, Fuad Efendi <f...@efendi.ca> wrote: > > It's not trunk version; I use (different) trunk versions in few production > sites... in SOLR, path "/update" is defined in solrconfig.xml (and usually > user will copy it from "example" schema and "may be" modify): > > <requestHandler name="/update" > class="solr.XmlUpdateRequestHandler"> > > > And, what ManifoldCF expects, which kind of "update" handler?!! > > That's why I suggest to use SOLRJ API instead... I noticed a lot of > low-level coding... > > > > What kind of SOLR protocol is expected? It is definitely not POST of XML > content: > > > /** Write a field */ > protected static void writeField(OutputStream out, String fieldName, > String fieldValue) > throws IOException > { > writePreamble(out); > writeBoundary(out,"text/plain; charset=UTF-8",fieldName,null); > > byte[] tmp = fieldValue.getBytes("UTF-8"); > out.write(tmp, 0, tmp.length); > writePostamble(out); > } > > > > Do you expect "binary" handler on SOLR? > <!-- Binary Update Request Handler > http://wiki.apache.org/solr/javabin > --> > <requestHandler name="/update/javabin" > class="solr.BinaryUpdateRequestHandler" /> > > > > > > > -----Original Message----- > From: Karl Wright [mailto:daddy...@gmail.com] > Sent: March-14-11 7:58 PM > To: connectors-user@incubator.apache.org > Subject: Re: SOLR > > The trunk version of Solr may have changed around how the extracting update > request handler works. It changes daily, so there is no way I can keep up > with it. Maybe it would be better to go back and use a known quantity. > > Thanks, > Karl > > > On Mon, Mar 14, 2011 at 6:24 PM, Fuad Efendi <f...@efendi.ca> wrote: >> >> Default settings for ManifoldCE: /update/extract >> http://localhost:8080/solr/update/extract?commit=true >> >> And using browser, I see SOLR responds with malformed HTML containing >> non-closing <HR>... >> >> Fix: >> Update handler: /update >> >> >> -Fuad >> >> >> -----Original Message----- >> From: Fuad Efendi [mailto:f...@efendi.ca] >> Sent: March-14-11 6:17 PM >> To: connectors-user@incubator.apache.org >> Subject: RE: SOLR >> >> Hi Karl, >> >> I verified (via browser), >> http://localhost:8080/solr/update?commit=true >> >> And response from SOLR: >> <?xml version="1.0" encoding="UTF-8"?> <response> <lst >> name="responseHeader"><int name="status">0</int><int >> name="QTime">15</int></lst> </response> >> >> The problem root is >> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(H >> ttpPos >> ter.java:1658) >> >> >> Everything is fine except I can't understand why we have "HR" from >> SOLR, do we have any multithreading issues? I believe I connect to >> SOLR, port 8080 is configured via console... may be somewhere else? >> >> I believe default setting for "Update handler:" at Connector screen is >> incorrect, it is /update/extract >> >> >> >> >> -----Original Message----- >> From: Karl Wright [mailto:daddy...@gmail.com] >> Sent: March-14-11 6:00 PM >> To: connectors-user@incubator.apache.org >> Subject: Re: SOLR >> >> This is because your solr setup is incorrect. The post to "solr" is >> returning HTML, not XML, so you are not actually communicating with >> Solr at all. >> >> In order for the Solr connector to work, you need to have the solr >> extracting update request handler present and configured. I am told >> that the latest release of Solr makes the jar with this code optional >> - it's a contrib jar that you have to separately download. If you are >> building solr off of trunk, then this should not be a problem. >> >> Kalr >> >> On Mon, Mar 14, 2011 at 5:40 PM, Fuad Efendi <f...@efendi.ca> wrote: >>> This exception, XML contains encoded HTML, and it doesn't happen with >>> standard Java 6 StAX parser: >>> >>> [Fatal Error] :124:120: The element type "HR" must be terminated by >>> the matching end-tag "</HR>". >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML >>> parsing >>> error: The element type "HR" must be terminated by the matching >>> end-tag "</HR>" >>> . >>> at >>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369) >>> at >>> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317) >>> at >>> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpP >>> o >>> ster.j >>> ava:619) >>> at >>> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run( >>> H >>> ttpPos >>> ter.java:1658) >>> Caused by: org.xml.sax.SAXParseException: The element type "HR" must >>> be terminated by the matching end-tag "</HR>". >>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) >>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown >>> Source) >>> at >>> javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124) >>> at >>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365) >>> ... 3 more >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: Fuad Efendi [mailto:f...@efendi.ca] >>> Sent: March-14-11 5:37 PM >>> To: connectors-user@incubator.apache.org >>> Subject: RE: SOLR >>> >>> Thank you very much Karl, >>> >>> And I have first problem, >>> Starting crawler... >>> [Fatal Error] :124:120: The element type "HR" must be terminated by >>> the matching end-tag "</HR>". >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML >>> parsing >>> error: The element type "HR" must be terminated by the matching >>> end-tag "</HR>" >>> . >>> at >>> org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369) >>> at >>> org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317) >>> >>> I am using RSS connector to crawl specific XML (containing >>> XML-encoded >HR< and other HTML tags). It doesn't happened with >>> standard StAX parser (Java 6)... >>> >>> >>> Regarding (2), do you mean this interface method? >>> /** View specification. >>> * This method is called in the body section of a job's view page. >>> Its purpose is to present the output specification information to the >> user. >>> * The coder can presume that the HTML that is output from this >>> configuration will be within appropriate <html> and <body> tags. >>> *@param out is the output to which any HTML should be sent. >>> *@param os is the current output specification for this job. >>> */ >>> public void viewSpecification(IHTTPOutput out, OutputSpecification >>> os) >>> throws ManifoldCFException, IOException >>> >>> >>> >>> Thanks! >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: Karl Wright [mailto:daddy...@gmail.com] >>> Sent: March-14-11 5:21 PM >>> To: connectors-user@incubator.apache.org >>> Subject: Re: SOLR >>> >>> Hi Fuad, >>> >>> (1) "Arguments" are indeed optional key/value pairs, which are sent >>> to solr as part of the URL. >>> (2) ManifoldCF presents tabs for a job of three kinds: (a) tabs that >>> all jobs have; (b) tabs related to the repository connector's >>> management of the document specification information; and (c) tabs >>> related to the output connector's output specification information. >>> The Solr output connector's output specification information includes >>> the metadata to solr mapping, so those tabs come from the Solr connector. >>> >>> Karl >>> >>> >>> On Mon, Mar 14, 2011 at 4:51 PM, Fuad Efendi <f...@efendi.ca> wrote: >>>> Hi, any sample of how to use SOLR connector? >>>> >>>> http://incubator.apache.org/connectors/end-user-documentation.html#s >>>> o >>>> l >>>> routputconnector >>>> >>>> >>>> >>>> Some questions: >>>> >>>> >>>> >>>> 1. Argument. Is it optional key=value pairs which can be sent >>>> to SOLR as part of HTTP GET/POST request? >>>> >>>> 2. I see code for “Connector”, and I see how to configure SOLR >>>> Output Connection. But how “Job” happens to know about <metadata> to >>>> <solr> mapping, is it generic (without dependency on SOLR)? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Fuad >>> >>> >> >> > >