Anjan created SLING-2924:
----------------------------

             Summary: Full text extraction issue with Tika v1.0 under OSGi 
environment
                 Key: SLING-2924
                 URL: https://issues.apache.org/jira/browse/SLING-2924
             Project: Sling
          Issue Type: Bug
          Components: JCR
            Reporter: Anjan


The latest stable build (I checked out revision 1487628) of Sling is using 
Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata 
and text for indexing purpose.  Jackrabbit v2.4.2 deployed as a separate web 
application extracts metadata and text from the uploaded documents perfectly 
fine, but when deployed in Sling (OSGi environment), full text extraction 
doesn't work.

Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.

Secondly, if the indexes are deleted from the repository and the server is 
restarted, indexes are not rebuilt for the existing documents.  The Tika 
bundles were not ready by the time Jackrabbit starts to rebuild the indexes 
during the Sling server start up.  Updating the startlevel from 15 to 10 for 
the Tika bundles helps to resolve the issue.

The changes related to above fixes are in 
<sling>/launchpad/builder/src/main/bundles/list.xml file.

Currently Tika bundles are at start level 15 as shown below:

<startLevel level="15">
..........
<bundle>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.0</version>
        </bundle>
        <bundle>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-bundle</artifactId>
            <version>1.0</version>
        </bundle>
..........
</startLevel>

Moved the above bundles to start level 10 and also the version is changed to 1.2

<startLevel level="10">
..........
<bundle>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.2</version>
        </bundle>
        <bundle>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-bundle</artifactId>
            <version>1.2</version>
        </bundle>
..........
</startLevel>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to