Hi,
If you look in launchpad/builder/src/main/bundles/list.xml  you will
find the tika bundles (below) that will almost certainly export what
you need to use Tika directly. Those bundles will be in the maven
repo. If you need a different version, then just add another bundle.
If that makes Jackrabbit unstable, (unlikely), then embed it. OSGi is
good like that ;).

You can also run Tika on the command line which can be a good way of
isolating its heap usage if dealing with poorly formed PDF files.

        <bundle>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.0</version>
        </bundle>
        <bundle>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-bundle</artifactId>
            <version>1.0</version>
        </bundle>

HTH
Ian

On 13 February 2013 02:56, Robert A. Decker <dec...@robdecker.com> wrote:
> Hi,
>
> We would like to use tika to extract raw text from pdfs, word docs, etc.
>
> I've found a maven dependency for jackrabbit-text-extractors:
> http://mvnrepository.com/artifact/org.apache.jackrabbit/jackrabbit-text-extractors/1.6.5
>
> I also see that there's a sling bundle 
> org.apache.sling.jackrabbit-text-extractors
>
> If all we want to do is extract text from documents (and don't really care 
> about lucene indexing) then should we use the text extractors directly as a 
> maven dependency? Or should we use the sling bundle and use it as a service?
>
> One problem I'm having is that I can't find the sling bundle in my svn clone 
> of the sling repo. Does the code live somewhere else? Also, I see that even 
> without this bundle there are a couple of tika services available in sling.
>
> Can someone provides some tips on getting started?
>
> thanks,
> Rob

Reply via email to