Julien Nioche created NUTCH-1652:
------------------------------------
Summary: Avoid instanciation of MimeUtil for each Content object
created
Key: NUTCH-1652
URL: https://issues.apache.org/jira/browse/NUTCH-1652
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.7
Reporter: Julien Nioche
Content objects instantiate and hold a MimeUtil in the constructor used by the
HttpBase class. This is wasteful and unnecessarily slows down the creation of
Content object as the MimeUtil creates a new Tika instance, reads from the
configuration etc...
Instead we could create a single instance of the MimeUtil class and pass it to
the a new Content constructor
{code}
public Content(String url, String base, byte[] content, String contentType,
Metadata metadata, MimeUtil mime)
{code}
and create a single instance of MimeUtil in HttpBase. We would also need to
make sure that the synchronisation is handled properly in MimeUtil (especially
for the calls to Tika) as the creation of the Content is done in a
multithreaded environment.
--
This message was sent by Atlassian JIRA
(v6.1#6144)