This is because your solr setup is incorrect.  The post to "solr" is
returning HTML, not XML, so you are not actually communicating with
Solr at all.

In order for the Solr connector to work, you need to have the solr
extracting update request handler present and configured.  I am told
that the latest release of Solr makes the jar with this code optional
- it's a contrib jar that you have to separately download.  If you are
building solr off of trunk, then this should not be a problem.

Kalr

On Mon, Mar 14, 2011 at 5:40 PM, Fuad Efendi <[email protected]> wrote:
> This exception, XML contains encoded HTML, and it doesn't happen with
> standard Java 6 StAX parser:
>
> [Fatal Error] :124:120: The element type "HR" must be terminated by the
> matching end-tag "</HR>".
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing
> error: The element type "HR" must be terminated by the matching end-tag
> "</HR>"
> .
>        at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>        at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>        at
> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.j
> ava:619)
>        at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPos
> ter.java:1658)
> Caused by: org.xml.sax.SAXParseException: The element type "HR" must be
> terminated by the matching end-tag "</HR>".
>        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
>        at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
>        ... 3 more
>
>
>
>
>
>
> -----Original Message-----
> From: Fuad Efendi [mailto:[email protected]]
> Sent: March-14-11 5:37 PM
> To: [email protected]
> Subject: RE: SOLR
>
> Thank you very much Karl,
>
> And I have first problem,
> Starting crawler...
> [Fatal Error] :124:120: The element type "HR" must be terminated by the
> matching end-tag "</HR>".
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing
> error: The element type "HR" must be terminated by the matching end-tag
> "</HR>"
> .
>        at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>        at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>
> I am using RSS connector to crawl specific XML (containing XML-encoded
> &gt;HR&lt; and other HTML tags). It doesn't happened with standard StAX
> parser (Java 6)...
>
>
> Regarding (2), do you mean this interface method?
>  /** View specification.
>  * This method is called in the body section of a job's view page.  Its
> purpose is to present the output specification information to the user.
>  * The coder can presume that the HTML that is output from this
> configuration will be within appropriate <html> and <body> tags.
>  *@param out is the output to which any HTML should be sent.
>  *@param os is the current output specification for this job.
>  */
>  public void viewSpecification(IHTTPOutput out, OutputSpecification os)
>    throws ManifoldCFException, IOException
>
>
>
> Thanks!
>
>
>
>
>
> -----Original Message-----
> From: Karl Wright [mailto:[email protected]]
> Sent: March-14-11 5:21 PM
> To: [email protected]
> Subject: Re: SOLR
>
> Hi Fuad,
>
> (1) "Arguments" are indeed optional key/value pairs, which are sent to solr
> as part of the URL.
> (2) ManifoldCF presents tabs for a job of three kinds: (a) tabs that all
> jobs have; (b) tabs related to the repository connector's management of the
> document specification information; and (c) tabs related to the output
> connector's output specification information.
> The Solr output connector's output specification information includes the
> metadata to solr mapping, so those tabs come from the Solr connector.
>
> Karl
>
>
> On Mon, Mar 14, 2011 at 4:51 PM, Fuad Efendi <[email protected]> wrote:
>> Hi, any sample of how to use SOLR connector?
>>
>> http://incubator.apache.org/connectors/end-user-documentation.html#sol
>> routputconnector
>>
>>
>>
>> Some questions:
>>
>>
>>
>> 1.       Argument. Is it optional key=value pairs which can be sent to
>> SOLR as part of HTTP GET/POST request?
>>
>> 2.       I see code for “Connector”, and I see how to configure SOLR
>> Output Connection. But how “Job” happens to know about <metadata> to
>> <solr> mapping, is it generic (without dependency on SOLR)?
>>
>>
>>
>> Thanks,
>>
>> Fuad
>
>

Reply via email to