On 24.01.11 14.48, Karl Wright wrote:
Thanks for the information.
What I'd like to do is wait until your research is done and then post
the rough instructions to d...@lucene.apache.org for confirmation that
your approach is the preferred one. I'd also like to know if you
check out the latest solr release from the svn tag and just build it,
whether you have any of these problems. I've been building
solr/lucene trunk and not using the binary distribution, which may be
why I never noticed that this has gone away in the main distribution.
OK, it might take a week or so, but here are some details I just figured
out:
- There is a bug with the current Solr release (1.4.1) which makes it
impossible to extract the content by using the ExtractingRequestHandler.
I think it is related to this Jira issue:
https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
- This issue is now fixed, and if I check out the latest release from
trunk, content can now be extracted by Tika.
What I need to test is whether I need to place the tika/extracting jars
manually in a lib folder when I deploy solr.war on Resin by using the
latest trunk version from SVN. When this is done, I can inform you.
Anyway, I don't like to build a search application for my university by
using the latest version from trunk, I would rather prefer to use an
official release. So maybe I will try to implement the changes from
trunk instead. I can already now see that Tika has a newer version in
trunk compared to the official 1.4.1 release, i.e. tika-core-0.8.jar
instead of tika-core-0.4.jar.
Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050