[ 
https://issues.apache.org/jira/browse/TIKA-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358959#comment-14358959
 ] 

Pavel Micka commented on TIKA-1573:
-----------------------------------

I will share the profiling results as soon as we will revaluate the 
performance. I agree with you that the result may not be significant in 
proportional terms, but the program invokes Tika several million times, so 
there should be some absolute difference. 

I agree that custom mime-type files is sufficient for adding of new mime-types, 
but if someone wants every mime type except this subset, then its no use for 
him, as the default ones are always loaded... (just imagine someone, who wants 
to distinguish all audio files (but no other)...) 

On a different side I have a different opinion on the possibility of extending 
Tika - I think that library classes should be easy to subclass and implement, 
as it is not possible for the author to support all possible use cases and one 
size does not fit everyone. Also if someoone decides to override/change some 
functionality, he does it at his own risk that he may disrupt something. These 
subclasses can be later submitted by active users back to the project and 
everyone benefits from this openness.

> Not possible to restrict default mime types
> -------------------------------------------
>
>                 Key: TIKA-1573
>                 URL: https://issues.apache.org/jira/browse/TIKA-1573
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Pavel Micka
>            Priority: Minor
>              Labels: performance
>
> I am facing the following problem. I am using MagicNumber detector, but the 
> detection is slow for my purposes, so I have decided to limit the number of 
> detected types. However this is not easily possible as: 
>  * Mimetypes does not have any remove method.
>  * getDefaultMimeTypes method by default load the full set
>  * MimeTypes constructor does not accept parameters (mimes with magics)
>  * method add is package friendly (so one must construct the wrapper in the 
> same package, which is awkward)
>  * MimeTypes class is final, so it does not allow to subclass it a improve 
> the implementation in object oriented way
> My workaround was to force the expected implementation (public add) with 
> reflection:
>                     Method addMethod = 
> decrMimeTypes.getClass().getDeclaredMethod("add", MimeType.class);
>                     addMethod.setAccessible(true);
>                     addMethod.invoke(myMimeTypes, 
> defaultMimeTypes.getRegisteredMimeType(m.toString()));
> I can imagine that the current implementation is done this way to be 
> immutable, but this can also achieved with parametrized constructor (point 3) 
> with no effect on immutability of the class. Or with explicit flag (set by 
> method call) that would disallow any further object modifications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to