[
https://issues.apache.org/jira/browse/TIKA-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202990#comment-15202990
]
Hudson commented on TIKA-1904:
------------------------------
FAILURE: Integrated in tika-2.x #53 (See
[https://builds.apache.org/job/tika-2.x/53/])
TIKA-1904 - Create Proxy Parser and Detectors (bob: rev
74e998d0ff359813dc06c695a7e786694c818932)
* tika-core/src/main/java/org/apache/tika/parser/ParserProxy.java
* tika-core/src/test/java/org/apache/tika/parser/ParserProxyTest.java
* tika-core/src/test/java/org/apache/tika/detect/DummyProxyDetector.java
* tika-core/src/test/java/org/apache/tika/parser/DummyProxyParser.java
* tika-core/src/main/java/org/apache/tika/detect/DetectorProxy.java
* tika-core/src/test/java/org/apache/tika/detect/DetectorProxyTest.java
> Tika 2.0 - Create Proxy Parser and Detectors
> --------------------------------------------
>
> Key: TIKA-1904
> URL: https://issues.apache.org/jira/browse/TIKA-1904
> Project: Tika
> Issue Type: Improvement
> Affects Versions: 2.0
> Reporter: Bob Paulin
> Assignee: Bob Paulin
>
> There are several parsers and detectors that instantiate parsers and
> detectors that live in different modules in tika 2.0. As of now these
> modules have are dependent on other modules this includes:
> tika-parser-office-module -> tika-parser-web-module, tika-parser-text-module,
> tika-parser-package-module
> tika-parser-ebook-module -> tika-parser-text-module
> tika-parser-journal-module -> tika-parser-pdf-module
> May of these dependencies could be made optional by introducing the concept
> of proxy parser and detectors that would enable functionality if all the
> dependencies are included in the project but not throw a
> ClassNotFoundException if the dependent module was not include( ex. parse
> function would do nothing).
> EX
> Currently
> ChmParser
> {code}
> private void parsePage(byte[] byteObject, ContentHandler xhtml) throws
> TikaException {// throws IOException
> InputStream stream = null;
> Metadata metadata = new Metadata();
> HtmlParser htmlParser = new HtmlParser();
> ContentHandler handler = new EmbeddedContentHandler(new
> BodyContentHandler(xhtml));// -1
> ParseContext parser = new ParseContext();
> try {
> stream = new ByteArrayInputStream(byteObject);
> htmlParser.parse(stream, handler, metadata, parser);
> } catch (SAXException e) {
> throw new RuntimeException(e);
> } catch (IOException e) {
> // Pushback overflow from tagsoup
> }
> }
> {code}
> Instead the HtmlParser could be Proxyed in the constructor
> {code}
> private final Parser htmlProxyParser;
>
> public ChmParser() {
> this.htmlProxyParser = new
> ParserProxy("org.apache.tika.parser.html.HtmlParser");
> }
> {code}
> And
> {code}
> private void parsePage(byte[] byteObject, ContentHandler xhtml) throws
> TikaException {// throws IOException
> InputStream stream = null;
> Metadata metadata = new Metadata();
> ContentHandler handler = new EmbeddedContentHandler(new
> BodyContentHandler(xhtml));// -1
> ParseContext parser = new ParseContext();
> try {
> stream = new ByteArrayInputStream(byteObject);
> htmlProxyParser.parse(stream, handler, metadata, parser);
> } catch (SAXException e) {
> throw new RuntimeException(e);
> } catch (IOException e) {
> // Pushback overflow from tagsoup
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)