[ 
https://issues.apache.org/jira/browse/TIKA-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202990#comment-15202990
 ] 

Hudson commented on TIKA-1904:
------------------------------

FAILURE: Integrated in tika-2.x #53 (See 
[https://builds.apache.org/job/tika-2.x/53/])
TIKA-1904 - Create Proxy Parser and Detectors (bob: rev 
74e998d0ff359813dc06c695a7e786694c818932)
* tika-core/src/main/java/org/apache/tika/parser/ParserProxy.java
* tika-core/src/test/java/org/apache/tika/parser/ParserProxyTest.java
* tika-core/src/test/java/org/apache/tika/detect/DummyProxyDetector.java
* tika-core/src/test/java/org/apache/tika/parser/DummyProxyParser.java
* tika-core/src/main/java/org/apache/tika/detect/DetectorProxy.java
* tika-core/src/test/java/org/apache/tika/detect/DetectorProxyTest.java


> Tika 2.0 - Create Proxy Parser and Detectors
> --------------------------------------------
>
>                 Key: TIKA-1904
>                 URL: https://issues.apache.org/jira/browse/TIKA-1904
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 2.0
>            Reporter: Bob Paulin
>            Assignee: Bob Paulin
>
> There are several parsers and detectors that instantiate parsers and 
> detectors that live in different modules in tika 2.0.  As of now these 
> modules have are dependent on other modules this includes:
> tika-parser-office-module -> tika-parser-web-module, tika-parser-text-module, 
> tika-parser-package-module
> tika-parser-ebook-module -> tika-parser-text-module
> tika-parser-journal-module -> tika-parser-pdf-module
> May of these dependencies could be made optional by introducing the concept 
> of proxy parser and detectors that would enable functionality if all the 
> dependencies are included in the project but not throw a 
> ClassNotFoundException if the dependent module was not include( ex. parse 
> function would do nothing).
> EX
> Currently
> ChmParser
> {code}
> private void parsePage(byte[] byteObject, ContentHandler xhtml) throws 
> TikaException {// throws IOException
>         InputStream stream = null;
>         Metadata metadata = new Metadata();
>         HtmlParser htmlParser = new HtmlParser();
>         ContentHandler handler = new EmbeddedContentHandler(new 
> BodyContentHandler(xhtml));// -1
>         ParseContext parser = new ParseContext();
>         try {
>             stream = new ByteArrayInputStream(byteObject);
>             htmlParser.parse(stream, handler, metadata, parser);
>         } catch (SAXException e) {
>             throw new RuntimeException(e);
>         } catch (IOException e) {
>             // Pushback overflow from tagsoup
>         }
>     }
> {code}
> Instead the HtmlParser could be Proxyed in the constructor
> {code}
> private final Parser htmlProxyParser;
>     
>     public ChmParser() {
>         this.htmlProxyParser = new 
> ParserProxy("org.apache.tika.parser.html.HtmlParser");
>     }
> {code}
> And 
> {code}
> private void parsePage(byte[] byteObject, ContentHandler xhtml) throws 
> TikaException {// throws IOException
>         InputStream stream = null;
>         Metadata metadata = new Metadata();
>         ContentHandler handler = new EmbeddedContentHandler(new 
> BodyContentHandler(xhtml));// -1
>         ParseContext parser = new ParseContext();
>         try {
>             stream = new ByteArrayInputStream(byteObject);
>             htmlProxyParser.parse(stream, handler, metadata, parser);
>         } catch (SAXException e) {
>             throw new RuntimeException(e);
>         } catch (IOException e) {
>             // Pushback overflow from tagsoup
>         }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to