Solr 1.4.1 has several bugs which makes it difficult to deploy MCF on a
application server such as Resin. I have struggled a lot with some of
these bugs and decided to share my experiences in case others have the
same problems.
First I figured out that I had to upgrade Tika to version 0.8 in order
to extract the content of MS Office documents etc. Solr 1.4.1 ships with
Tika 0.4 and will not work:
https://issues.apache.org/jira/browse/SOLR-1902
Here you have basically two options:
1. Install the following branch:
http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.4/
2. Install the latest version from trunk (not recommended for production
use).
Then I figured out that I couldn't parse dates correctly. You have the
option in ExtractingRequestHandler to specify different date formats by
the following example:
<lst name="date.formats">
<str>yyyy-MM-dd</str>
<str>dd.MM.yyyy</str>
</lst>
This will cause a lazy loading error due to the following bug:
https://issues.apache.org/jira/browse/SOLR-1756
You have the following workaround:
1. Install the branch mentioned above and then install the following patch:
https://issues.apache.org/jira/secure/attachment/12434831/SOLR-1756.patch
2. Install the latest version from trunk.
Remember to rebuild Solr and place the necessary jar files in a separate
folder which your application server has access to
(apache-solr-cell*.jar, Tika and its depencencies).
Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050