Anjan created SLING-2924:
----------------------------
Summary: Full text extraction issue with Tika v1.0 under OSGi
environment
Key: SLING-2924
URL: https://issues.apache.org/jira/browse/SLING-2924
Project: Sling
Issue Type: Bug
Components: JCR
Reporter: Anjan
The latest stable build (I checked out revision 1487628) of Sling is using
Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata
and text for indexing purpose. Jackrabbit v2.4.2 deployed as a separate web
application extracts metadata and text from the uploaded documents perfectly
fine, but when deployed in Sling (OSGi environment), full text extraction
doesn't work.
Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.
Secondly, if the indexes are deleted from the repository and the server is
restarted, indexes are not rebuilt for the existing documents. The Tika
bundles were not ready by the time Jackrabbit starts to rebuild the indexes
during the Sling server start up. Updating the startlevel from 15 to 10 for
the Tika bundles helps to resolve the issue.
The changes related to above fixes are in
<sling>/launchpad/builder/src/main/bundles/list.xml file.
Currently Tika bundles are at start level 15 as shown below:
<startLevel level="15">
..........
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.0</version>
</bundle>
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-bundle</artifactId>
<version>1.0</version>
</bundle>
..........
</startLevel>
Moved the above bundles to start level 10 and also the version is changed to 1.2
<startLevel level="10">
..........
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.2</version>
</bundle>
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-bundle</artifactId>
<version>1.2</version>
</bundle>
..........
</startLevel>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira