Hello Karl,

Thanks for responding to last email and clarifying the URLColumn information.

I set up Solr myself with Tomcat, following the instructions on the Solr 
website. I am running the Solr Example...and when in my browser if I 
http://localhost:8080/solr/admin I get my Solr admin page to display. I did 
make slight modifications to the solrconfig.xml to include a data import 
hander, as I wanted to try and get Solr to oracle working first, by adding the 
following xml snippets

        requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">
        <lst name="defaults">
          <str name="config">data-config.xml</str>
        </lst>
        </requestHandler>

Added the following to the Lib Directives. These jars are in my lib folder in 
my Solr home directory. ojdbc14.jar is the JDBC Drivers (not that I also put 
the same JDBC jar in jdbc-drivers, added to the class path and built 
manifoldCF, as instructed by the MCF website.

  <lib dir="./lib" regex="apache-solr-dataimporthandler-\d.*\.jar" />
  <lib dir="./lib" regex="ojdbc14.jar" />
  <lib dir="./lib" regex="apache-solr-dataimporthandler-extras-3.4.0" /

And created a data-config.xml
<dataConfig>
  <dataSource type="JdbcDataSource"
              driver="oracle.jdbc.OracleDriver"
              url="jdbc:oracle:thin:@<host>:<port>:<service>"
              user="user"
              password="password"/>
  <document>
    <entity name="id"
            query="select AIRCRAFT_ID, AIRCRAFT_MAKE from AIRCRAFT">
       <field column="AIRCRAFT_ID" name="id"/>
       <field column="AIRCRAFT_MAKE" name="name"/>
    </entity>
  </document>
</dataConfig>

As far as modifications go...that's all that was changed, I don't really see it 
impacting with MCF, correct if wrong

As for the solr logs, The log file gets updated every time MCF tries to index 
docs. This is a little cut and paste, it repeats this for the four documents 
AC001,AC002,AC003,AC004...just showing you for one of them as they are quite 
large

-------------------------------------------------------------------------------------------------------------------------
Nov 22, 2011 4:10:31 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update 
params={literal.id=localhost:8080/solr?id%3DAC003} status=400 QTime=0
Nov 22, 2011 4:10:31 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {} 0 0
Nov 22, 2011 4:10:31 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Unexpected character 'S' (code 
83) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
        at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
        at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
        at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:964)
        at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
        at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character 
'S' (code 83) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]
        at 
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
        at 
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:104)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
        ... 21 more
-------------------------------------------------------------------------------------------------------------------------

Please not that I have also tried this with Solr also only using Jetty, and 
having the Output solr connector on port 8983.
I get something similar in the Simple status history as well, which is.

11-22-2011 16:20:33.263 document ingest (Solr) localhost:8080/solr?id=AC004 400 
48 10 Unexpected character 'S' (code 83) in prolog; expected '<' at [row,col 
{unknown-source}]: [1,1]
11-22-2011 16:20:33.223 document ingest (Solr) localhost:8080/solr?id=AC003 400 
43 10 Unexpected character 'U' (code 85) in prolog; expected '<' at [row,col 
{unknown-source}]: [1,1]
11-22-2011 16:20:33.176 document ingest (Solr) localhost:8080/solr?id=AC002 400 
34 17 Unexpected character 'U' (code 85) in prolog; expected '<' at [row,col 
{unknown-source}]: [1,1]
11-22-2011 16:20:32.895 document ingest (Solr) localhost:8080/solr?id=AC001 400 
27 140 Unexpected character 'U' (code 85) in prolog; expected '<' at [row,col 
{unknown-source}]: [1,1]

I have been doing some searches on this, don't seem to find a remedy. Any 
thoughts.

Thanks Karl.

Adam.

-----Original Message-----
From: Karl Wright [mailto:daddy...@gmail.com]
Sent: Monday, 21 November 2011 7:04 PM
To: connectors-user@incubator.apache.org
Subject: Re: MCF - Oracle to Solr

Hi Adam,

Like I said before, the Simple History shows clearly that you have a perfectly 
reasonable URL for your documents.  That is NOT the problem.
 The URL does not even need to be real, it's just an identifier of sorts as far 
as ManifoldCF and Solr are concerned.  As I said before, you will probably want 
to make it real eventually, because otherwise there's no way to link back to 
display the content of your search results, but that's not important for 
indexing.

Many people have indexed JDBC content successfully.  But Solr is, on the other 
hand, very highly configurable, and depending on how you have set up your 
solrconfig.xml and/or schema.xml file you can certainly get back 500 errors or 
400 errors from it when ManifoldCF tries to index something.  When that happens 
all that usually needs to be done is that either the configuration of the 
output connection needs to be changed, or the solrconfig.xml and/or schema.xml 
needs to be changed.

So let's start by exploring how you have set up your Solr.  Are you running the 
solr example without modification?  Or have you (or someone else) set Solr up 
specifically for your search problem?  Can you find out where the Solr standard 
error and standard output is going?  If so, you should see output for each 
document that ManifoldCF tries to index.  Do you see this output, and what does 
it say?

I should also mention that several versions of Solr returned 400 errors for 
zero-length documents indexed through the extracting update handler, which is 
what ManifoldCF uses.  This is not usually a problem anyhow because, although 
it is noisy, there would not be any content for the document anyway.  But is 
there any possibility that the database field you are indexing as the content 
field has nothing in it some or all of the time?

Karl

On Mon, Nov 21, 2011 at 1:01 AM, Adam LaPila <adam.lap...@lmal.com.au> wrote:
> Hi Karl,
>
> Still no luck. You wouldn't happen to have a link to any good resources of 
> how to index a DB to Solr with MCF...other than the end-user examples from 
> the website? Perhaps some of your own work with the use of a Database, can be 
> oracle, mysql, etc.
> Do you know or anyone who has tried this before and was successful?
> any design documents...im been googling non stop, surely someone has
> done this before
>
> With the  CONCAT('http://localhost:8080/solr?id=',AIRCRAFT_ID) AS
> "$(URLCOLUMN)"..FROM...WHERE)
>
> Does the URL need to be the link to the solr example? As the end user 
> documentation says "URLCOLUMN, The name of an expected resultset column 
> containing a URL"
>
> This is what I have been getting in the Simple History since my last email.
> 11-21-2011 16:53:28.378 document ingest (Solr)
> localhost:8080/solr?id=AC004
>  400 48 1 Bad Request
>
> 11-21-2011 16:53:28.347 document ingest (Solr)
> localhost:8080/solr?id=AC003
>  400 43 1 Bad Request
>
> 11-21-2011 16:53:28.331 document ingest (Solr)
> localhost:8080/solr?id=AC002
>  400 34 1 Bad Request
>
> 11-21-2011 16:53:28.300 document ingest (Solr)
> localhost:8080/solr?id=AC001
>  400 27 1 Bad Request
>
> Sorry for any troubles, just a little confused with it all.
>
> Regards,
>
> Adam.
>
>
>
>
> -----Original Message-----
> From: Karl Wright [mailto:daddy...@gmail.com]
> Sent: Monday, 21 November 2011 11:43 AM
> To: connectors-user@incubator.apache.org
> Subject: Re: MCF - Oracle to Solr
>
> Hi Adam,
>
> The 500 error is coming from Solr, so the place to look is in the Solr logs 
> and output.  If you are running the Solr example, you should be seeing stack 
> traces which may shed light on what is happening.
>
> FWIW, I doubt very much that this has anything to do with your URL 
> construction, which looks good based on what the Simple History indicates.
>
> Thanks,
> Karl
>
> On Sun, Nov 20, 2011 at 7:02 PM, Adam LaPila <adam.lap...@lmal.com.au> wrote:
>> Hello,
>>
>>
>>
>> Im trying to get MCF to index my oracle repository to my solr output
>> repository. I have been following the end-user documentation and im
>> still having trouble getting things to work. I have also installed
>> and running solr off a tomcat server on port 8080
>>
>>
>>
>> I have set up my output, repository connectors. These seem to be
>> fine, as it has the "Connection Working" status.
>>
>> I am sure the problem is how I'm setting up my job to extract the
>> database table data, to my solr index.
>>
>>
>>
>> I received an email from Karl a couple of days ago in regards to the
>> queries provided.
>>
>>
>>
>> "SELECT CONCAT('http://myserver.com?id=',Aircraft_ID) AS $(URLCOLUMN), ...
>> FROM ... WHERE ..."
>>
>>
>>
>> I have changed my query to be more like this.
>>
>>
>>
>> This is what I have as my Data Query:
>>
>> SELECT AIRCRAFT_ID AS "$(IDCOLUMN)", AIRCRAFT_INFO as
>> "$(DATACOLUMN)",
>> CONCAT('http://localhost:8080/solr?id=',AIRCRAFT_ID) AS "$(URLCOLUMN)"
>> FROM AIRCRAFT WHERE AIRCRAFT_ID IN $(IDLIST)
>>
>>
>>
>> When I run the job, I find that in the simple history I get something
>> like this.
>>
>>
>>
>> document ingest (Solr)            http://localhost:8080/solr?id=AC001
>>             500      27        16        Internal Server Error
>>
>> AC001 is of the ID's in the table
>>
>>
>>
>> I am obviously doing something wrong when it comes to the URL, but I
>> have tried a few things and no success. Is the url correct or should
>> be something else.
>>
>> Any help would be greatly appreciated.
>>
>>
>>
>> My oracle database info is similar to this
>>
>> Service Name: somthing.somthing.end.net.au and
>>
>> Host: somthing.end.net.au
>>
>> Port:  1521
>>
>>
>>
>> I just have a simple Database with a table called Aircraft - With
>> Aircraft_id, Aircraft_make, Aircraft_Model, Aircraft_Info...all I
>> want to do is get the data from these columns and index them to solr
>>
>>
>>
>> Thanks,
>>
>> Adam.
>>
>>
>>
>> ________________________________
>> This message is intended only for the use of the intended
>> recipient(s) If you are not an intended recipient, you are hereby
>> notified that any use, dissemination, disclosure or copying of this
>> communication is strictly prohibited. If you have received this
>> communication in error please destroy all copies of this message and
>> its attachments and notify the sender immediately
>>
>
> This message is intended only for the use of the intended recipient(s)
> If you are not an intended recipient, you are hereby notified that any use, 
> dissemination, disclosure or copying of this communication is strictly 
> prohibited. If you have received this communication in error please destroy 
> all copies of this message and its attachments and notify the sender 
> immediately
>

This message is intended only for the use of the intended recipient(s) If you 
are not an intended recipient, you are hereby notified that any use, 
dissemination, disclosure or copying of this communication is strictly 
prohibited. If you have received this communication in error please destroy all 
copies of this message and its attachments and notify the sender immediately

Reply via email to