Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "Tika2_0MigrationGuide" page has been changed by BobPaulin: https://wiki.apache.org/tika/Tika2_0MigrationGuide?action=diff&rev1=2&rev2=3 = Major Changes = + == Tika Modules == + + In Tika 2.x the tika-parsers project has been split into 15 separate modules. With Tika's ever growing list of parsers the modules give developers the ability to pick and choose sub-groupings of parsers without bringing every parser dependency into a project. For example projects using Tika 1.12 parsers would include the following entry in an Apache Maven pom.xml dependency element: + + {{{ + <dependency> + <groupId>org.apache.tika</groupId> + <artifactId>tika-parsers</artifactId> + <version>1.12</version> + </dependency> + }}} + + If this project were only being used to parse PDF files this could be refactored to the entry below on Tika 2.x: + + {{{ + <dependency> + <groupId>org.apache.tika</groupId> + <artifactId>tika-parser-pdf-module</artifactId> + <version>2.0</version> + </dependency> + }}} + + The 2.x branch also introduces the ParserProxy, DetectorProxy, and EncodingDetectorProxy classes that allow developers to compose Parsers, Detectors and EncodingDetectors using classes that may or may not exist on the classpath. + + For example the OutlookExtractor exists in the tika-parser-office-module. An Outlook message may contain HTML content but the user may not want to include the tika-parser-web-module that contains the HtmlParser. By wrapping the HtmlParser in a ParserProxy: + + {{{ + this.htmlParserProxy = new ParserProxy("org.apache.tika.parser.html.HtmlParser", getClass().getClassLoader()); + }}} + + The OutlookExtractor will only parse the HTML content if that module is included. If not the parser fails silently by default. In cases where the developer wants to be warned that a proxy has failed the developer may set the following System Property: + + {{{ + org.apache.tika.service.proxy.error.warn=true + }}} + + Which will print a warning message when the class being proxied is not found. + + + == Tika Bundles == + = Minor Changes =
