[
https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096668#comment-15096668
]
Uwe Schindler commented on TIKA-1824:
-------------------------------------
Hi, as invited on TIKA-1830, here some comments from Apache Solr:
{quote}
As already stated in the past, we would like to only bundle parsers for text
document formats, because images, class files or else are not really useful for
indexing by default. Users that want to do this, can still add the missing
parser bundles and SPI will do the rest. Currently we have disabled some
parsers by removing the JAR files (like asm-all.jar, netcdf.jar), so TIKA's SPI
will disable them automatically (because of ClassNotFoundEx). This was a bit
rude, but worked.
The reason for this was partly also some version incompatibilities (ASM was old
in TIKA, Lucene needs newest one), but ASM is not really useful for indexing
anyways!
In Solr we don't use transitive dependencies in Ivy, so we decide for each JAR
file which one gets bundled, so we check every release anyways during update.
{quote}
In addition, it would be a good idea to allow loading the TIKA SPI files in a
separate classloader (to isolate the parser classes from others). The reason
for this is JAR hell. If TIKA would load the parsers in its own classloader
(optionally, e.g. by configuration), we could place all parsers and their
dependencies in a separate lib directory outside the Solr's lib folder.
> Tika 2.0 - Create Initial Parser Modules
> -----------------------------------------
>
> Key: TIKA-1824
> URL: https://issues.apache.org/jira/browse/TIKA-1824
> Project: Tika
> Issue Type: Improvement
> Affects Versions: 2.0
> Reporter: Bob Paulin
> Assignee: Bob Paulin
>
> Create initial break down of parser modules.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)