Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "CompositeParserDiscussion" page has been changed by NickBurch:
https://wiki.apache.org/tika/CompositeParserDiscussion?action=diff&rev1=3&rev2=4

Comment:
Config

        <mime>application/pdf</mime>
      </parser>
  
-     <!-- JPEG needs special handling -->
+     <!-- JPEG needs special handling - try+combine everything -->
-     <!-- XML needs special handling -->
+     <parser class="org.apache.tika.parser.(suppliment)">
+        <parser class="org.apache.tika.parser.ocr.TesseractOCRParser" />
+        <parser class="org.apache.tika.parser.image.ImageParser" />
+        <parser class="org.apache.tika.parser.jpeg.JpegParser" />
+        <parser class="org.apache.tika.parser.gdal.GDALParser" />
+        <!-- TODO DO we need to give mimetypes here too? Or can we get 
implicitly? -->
+     </parser>
+ 
+     <!-- XML needs special handling - use fallbacks to get something -->
+     <parser class="org.apache.tika.parser.(fallback)">
+        <parser class="my.custom.xml.parser" />
+        <parser class="org.apache.tika.parser.xml.XMLParser" />
+        <parser class="org.apache.tika.parser.html.HTMLParser" />
+        <parser class="org.apache.tika.parser.txt.TXTParser" />
+        <mime>application/xml</mime>
+     </parser>
    </parsers>
  }}}
  
  == In Code ==
- ''TODO''
+ Whatever we do, this must be available from code too, much as how today 
people can create custom {{{CompositeParser}}} instances, or wrap things up 
with custom {{{ParserDecorator}}} instances
  
+ We also need an example for all of these, not only in unit tests, but also in 
the examples pacakge
+ 

Reply via email to