Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "Tika2_0MigrationGuide" page has been changed by BobPaulin:
https://wiki.apache.org/tika/Tika2_0MigrationGuide?action=diff&rev1=2&rev2=3

  
  = Major Changes =
  
+ == Tika Modules ==
+ 
+ In Tika 2.x the tika-parsers project has been split into 15 separate modules. 
 With Tika's ever growing list of parsers the modules give developers the 
ability to pick and choose sub-groupings of parsers without bringing every 
parser dependency into a project.  For example projects using Tika 1.12 parsers 
would include the following entry in an Apache Maven pom.xml dependency element:
+ 
+ {{{
+ <dependency>
+     <groupId>org.apache.tika</groupId>
+     <artifactId>tika-parsers</artifactId>
+     <version>1.12</version>
+ </dependency>
+ }}}
+ 
+ If this project were only being used to parse PDF files this could be 
refactored to the entry below on Tika 2.x:
+ 
+ {{{
+ <dependency>
+     <groupId>org.apache.tika</groupId>
+     <artifactId>tika-parser-pdf-module</artifactId>
+     <version>2.0</version>
+ </dependency>
+ }}}
+ 
+ The 2.x branch also introduces the ParserProxy, DetectorProxy, and 
EncodingDetectorProxy classes that allow developers to compose Parsers, 
Detectors and EncodingDetectors using classes that may or may not exist on the 
classpath.
+ 
+ For example the OutlookExtractor exists in the tika-parser-office-module.  An 
Outlook message may contain HTML content but the user may not want to include 
the tika-parser-web-module that contains the HtmlParser.  By wrapping the 
HtmlParser in a ParserProxy:
+ 
+ {{{
+ this.htmlParserProxy = new 
ParserProxy("org.apache.tika.parser.html.HtmlParser", 
getClass().getClassLoader());
+ }}}
+ 
+ The OutlookExtractor will only parse the HTML content if that module is 
included.  If not the parser fails silently by default.  In cases where the 
developer wants to be warned that a proxy has failed the developer may set the 
following System Property:
+ 
+ {{{
+ org.apache.tika.service.proxy.error.warn=true
+ }}}
+ 
+ Which will print a warning message when the class being proxied is not found.
+ 
+ 
+ == Tika Bundles ==
+ 
  = Minor Changes =
  

Reply via email to