Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TesseractOCRStats" page has been changed by ChrisMattmann: https://wiki.apache.org/tika/TesseractOCRStats New page: Here are some stats contributed by Mark Kerzner and Amanda Towler from Hyperion Gray. {{{ Total number of images to process: about 300,000 Average time per image: about 1 sec Total run time required: about 10 days Our run times on various bathes: about 1 day total OCR quality: decent }}} = Future Work = * Use Tika, rather than do Tesseract directly * Scale it up with Spark or Hadoop * A few polishes, with the view on other teams/projects using it later
