Hi Karl, I verified (via browser), http://localhost:8080/solr/update?commit=true
And response from SOLR: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">15</int></lst> </response> The problem root is org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPos ter.java:1658) Everything is fine except I can't understand why we have "HR" from SOLR, do we have any multithreading issues? I believe I connect to SOLR, port 8080 is configured via console... may be somewhere else? I believe default setting for "Update handler:" at Connector screen is incorrect, it is /update/extract -----Original Message----- From: Karl Wright [mailto:[email protected]] Sent: March-14-11 6:00 PM To: [email protected] Subject: Re: SOLR This is because your solr setup is incorrect. The post to "solr" is returning HTML, not XML, so you are not actually communicating with Solr at all. In order for the Solr connector to work, you need to have the solr extracting update request handler present and configured. I am told that the latest release of Solr makes the jar with this code optional - it's a contrib jar that you have to separately download. If you are building solr off of trunk, then this should not be a problem. Kalr On Mon, Mar 14, 2011 at 5:40 PM, Fuad Efendi <[email protected]> wrote: > This exception, XML contains encoded HTML, and it doesn't happen with > standard Java 6 StAX parser: > > [Fatal Error] :124:120: The element type "HR" must be terminated by > the matching end-tag "</HR>". > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing > error: The element type "HR" must be terminated by the matching > end-tag "</HR>" > . > at > org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369) > at > org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317) > at > org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPo > ster.j > ava:619) > at > org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(H > ttpPos > ter.java:1658) > Caused by: org.xml.sax.SAXParseException: The element type "HR" must > be terminated by the matching end-tag "</HR>". > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown > Source) > at > javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124) > at > org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365) > ... 3 more > > > > > > > -----Original Message----- > From: Fuad Efendi [mailto:[email protected]] > Sent: March-14-11 5:37 PM > To: [email protected] > Subject: RE: SOLR > > Thank you very much Karl, > > And I have first problem, > Starting crawler... > [Fatal Error] :124:120: The element type "HR" must be terminated by > the matching end-tag "</HR>". > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing > error: The element type "HR" must be terminated by the matching > end-tag "</HR>" > . > at > org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369) > at > org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317) > > I am using RSS connector to crawl specific XML (containing XML-encoded > >HR< and other HTML tags). It doesn't happened with standard > StAX parser (Java 6)... > > > Regarding (2), do you mean this interface method? > /** View specification. > * This method is called in the body section of a job's view page. > Its purpose is to present the output specification information to the user. > * The coder can presume that the HTML that is output from this > configuration will be within appropriate <html> and <body> tags. > *@param out is the output to which any HTML should be sent. > *@param os is the current output specification for this job. > */ > public void viewSpecification(IHTTPOutput out, OutputSpecification > os) > throws ManifoldCFException, IOException > > > > Thanks! > > > > > > -----Original Message----- > From: Karl Wright [mailto:[email protected]] > Sent: March-14-11 5:21 PM > To: [email protected] > Subject: Re: SOLR > > Hi Fuad, > > (1) "Arguments" are indeed optional key/value pairs, which are sent to > solr as part of the URL. > (2) ManifoldCF presents tabs for a job of three kinds: (a) tabs that > all jobs have; (b) tabs related to the repository connector's > management of the document specification information; and (c) tabs > related to the output connector's output specification information. > The Solr output connector's output specification information includes > the metadata to solr mapping, so those tabs come from the Solr connector. > > Karl > > > On Mon, Mar 14, 2011 at 4:51 PM, Fuad Efendi <[email protected]> wrote: >> Hi, any sample of how to use SOLR connector? >> >> http://incubator.apache.org/connectors/end-user-documentation.html#so >> l >> routputconnector >> >> >> >> Some questions: >> >> >> >> 1. Argument. Is it optional key=value pairs which can be sent >> to SOLR as part of HTTP GET/POST request? >> >> 2. I see code for Connector, and I see how to configure SOLR >> Output Connection. But how Job happens to know about <metadata> to >> <solr> mapping, is it generic (without dependency on SOLR)? >> >> >> >> Thanks, >> >> Fuad > >
