Reuse tagsoup HtmlSchema instance across HtmlParsers (performance improvement)
------------------------------------------------------------------------------

                 Key: TIKA-528
                 URL: https://issues.apache.org/jira/browse/TIKA-528
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Bruno Dumon


While parsing a set of small HTML files (email messages), I noticed using a 
profiler that about a third of the time was being spent in the construction of 
tagsoup's HTMLSchema class.

Since this is (or appears to be to me) simply a data structure, it is thread 
safe and can be reused. Fortunately this can be done easily, as shown in the 
patch I will attach to this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to