[ 
https://issues.apache.org/jira/browse/TIKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072328#comment-13072328
 ] 

Antoni Mylka commented on TIKA-686:
-----------------------------------

FWIW I would say that fewer is better. 

We (Aperture) tried it and overdid this. Long story short: version 1.4 was 
split into 73 modules, with 31 external dependencies, builds took forever and 
day-to-day development work was a pain. It was madness. Clearly, with a bit 
more common sense it might have worked out better, but the key issue was that 
nobody wanted this and everyone used a special 'onejar' assembly anyway. 

I don't like optional dependencies. I need lots of XML in my pom to make my app 
work.

I personally like exclusions better. Just it's necessary to make sure that

{{<dependency>
 <groupId>org.apache.tika</groupId>
 <artifactId>tika-parsers</artifactId>
 <exclusions>
   <exclusion>
     <groupId>org.apache.poi</groupId>
     <artifactId>poi</artifactId>
   </exclusion>
   <exclusion>
     <groupId>org.apache.poi</groupId>
     <artifactId>poi-scratchpad</artifactId>
   </exclusion>
   <exclusion>
     <groupId>org.apache.poi</groupId>
     <artifactId>poi-ooxml</artifactId>
   </exclusion>
 </exclusions>
</dependency>}}

... works without ClassNotFoundErrors. (Aperture throws them in such a case 
right now).

A solution with pom-only modules for each parser are OK as long as the default 
case is left as it is. The same problem will have to be solved though. If I 
only want office with poi, then the Tika facade must not initialize the 
PdfParser even though the class itself is present on the classpath, just its 
dependencies aren't.

> Split tika-parsers into separate components
> -------------------------------------------
>
>                 Key: TIKA-686
>                 URL: https://issues.apache.org/jira/browse/TIKA-686
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Christopher Currie
>            Priority: Minor
>
> The email thread [1] from two years ago that led to splitting Tika into 
> separate components also suggested splitting tika-parsers into separate 
> components based on dependencies. This would be extremely useful, especially 
> in cases where a given parser has no dependencies beyond tika-core. Please 
> consider refactoring the parsers into separate components for 1.0.
> [1] http://markmail.org/message/tavirkqhn6r2szrz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to