Solr 1.4.1 has several bugs which makes it difficult to deploy MCF on a application server such as Resin. I have struggled a lot with some of these bugs and decided to share my experiences in case others have the same problems.

First I figured out that I had to upgrade Tika to version 0.8 in order to extract the content of MS Office documents etc. Solr 1.4.1 ships with Tika 0.4 and will not work:
https://issues.apache.org/jira/browse/SOLR-1902

Here you have basically two options:
1. Install the following branch:
http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.4/
2. Install the latest version from trunk (not recommended for production use).

Then I figured out that I couldn't parse dates correctly. You have the option in ExtractingRequestHandler to specify different date formats by the following example:
<lst name="date.formats">
  <str>yyyy-MM-dd</str>
  <str>dd.MM.yyyy</str>
</lst>

This will cause a lazy loading error due to the following bug:
https://issues.apache.org/jira/browse/SOLR-1756

You have the following workaround:
1. Install the branch mentioned above and then install the following patch:
https://issues.apache.org/jira/secure/attachment/12434831/SOLR-1756.patch
2. Install the latest version from trunk.

Remember to rebuild Solr and place the necessary jar files in a separate folder which your application server has access to (apache-solr-cell*.jar, Tika and its depencencies).

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Reply via email to