Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "FrontPage" page has been changed by NickBurch: https://wiki.apache.org/tika/FrontPage?action=diff&rev1=39&rev2=40 Comment: Pull out some integration/config bits from design = Committer Info = * ReleaseProcess - Info on releasing Tika * ThirdPartySonaType - A guide to staging and deploying third party jars on [[https://oss.sonatype.org/|Sonatype OSSRH]] (OSS Repository Hosting) for subsequent use within Tika parser wrappers - * VirtualMachine - a virtual machine hosted by Rackspace that allows an instance of [[TikaJaxRS|Tika Server]] to run for public testing. Set up by Tim Allison et al. + * VirtualMachine - a virtual machine hosted by Rackspace that allows an instance of [[TikaJAXRS|Tika Server]] to run for public testing. Set up by Tim Allison et al. = User Notes = * PostingManyFilesToExtractingRequestHandler - How to post many files to the Extracting Request Handler (Tika) in Solr. * IntegratingTikaWithExtractingRequestHandler - Building the latest Tika and integrating it with the Extracting Request Handler (Tika) in Solr. * [[TesseractOCRStats|Some stats using Tesseract OCR]] - some stats from a contributing team (Hyperion Gray) about using TesseractOCR (will be updated with Tika). + * [[Troubleshooting Tika]] * Upgrading to [[PDFBOX_2_X_NOTES|PDFBox 2.x]] = MIME identification design/implementation = * [[BaysianMimeTypeSelector|Bayesian MIME selection]] - Tika's new Bayesian MIME selector. * [[ContentMimeDetection|Content-based MIME selection with Byte histograms]] - Tika's new content/byte histogram MIME detector. + + = Configuration and Integration = + * [[API Bindings for Tika]] - Using Tika from additional languages and frameworks + * [[cTAKESParser|Getting Tika and Running with Apache cTAKES]] - How to use Tika with Apache cTAKES the clinical text biomedical knowledge extraction framework. + * [[EXIFToolParser|Getting Tika up and Running with EXIFTool]] - How to use Tika with EXIFTool. + * [[FFMPEGParser|Getting Tika up and Running with FFMPEG]] - How to use Tika with FFMPEG. + * [[GeoTopicParser|Getting Tika up and Running with the GeoTopicParser based on Geonames.org, Lucene, and OpenNLP]] + * [[TikaOCR|Getting Tika up and Running with OCR]] - How to use Tika with OCR from Tesseract. + * [[TikaGDAL|Getting Tika up and Running with the Geospatial Data Abstraction Library (GDAL)]] - How to use Tika with GDAL to parse/extract geospatial data files. = Design = @@ -48, +58 @@ * [[TikaJAXRS|Tika JAX-RS Server]] - documentation on the recently contributed tika-server module. * [[MetadataRoadmap|Metadata roadmap]] - Documentation and Discussion about the metadata roadmap for Tika * [[ErrorsAndExceptions|Errors and Exceptions]] - What parsers should output/throw when, for empty/invalid/unsupported files - * [[cTAKESParser|Getting Tika and Running with Apache cTAKES]] - How to use Tika with Apache cTAKES the clinical text biomedical knowledge extraction framework. - * [[EXIFToolParser|Getting Tika up and Running with EXIFTool]] - How to use Tika with EXIFTool. - * [[FFMPEGParser|Getting Tika up and Running with FFMPEG]] - How to use Tika with FFMPEG. - * [[GeoTopicParser|Getting Tika up and Running with the GeoTopicParser based on Geonames.org, Lucene, and OpenNLP]] - * [[TikaOCR|Getting Tika up and Running with OCR]] - How to use Tika with OCR from Tesseract. - * [[TikaGDAL|Getting Tika up and Running with the Geospatial Data Abstraction Library (GDAL)]] - How to use Tika with GDAL to parse/extract geospatial data files. * [[CompositeParserDiscussion|Composite Parsers discussion]] - How to give users sensible+clear control of multiple parsers for a given file type * [[Tika2_0RoadMap|Tika 2.0 discussion]] - Roadmap for changes we would like to make for Tika 2.0
