Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaInHadoop" page has been changed by TimothyAllison: https://wiki.apache.org/tika/TikaInHadoop?action=diff&rev1=1&rev2=2 = Running Tika in Hadoop = On very rare occasions, Tika can fail catastrophically: infinite hang or out of memory errors. There may be other features of Tika that make it useful for developers to share notes on how to run Tika at scale. This page is intended to gather lessons learned and offer pointers for running Tika in the Hadoop framework. + = Useful Parameters = + + = Lessons Learned = + + = Links = + * William Palmer's blog post on running Tika in Hadoop -- [[http://openpreservation.org/knowledge/blogs/2014/03/21/tika-ride-characterising-web-content-nanite/| Tika to Ride]] + + = Frameworks = + * Julien Nioche's [[https://github.com/DigitalPebble/behemoth|Behemoth]] + * William Palmer's [[https://github.com/openpreserve/nanite/tree/master/nanite-hadoop|Nanite]] +
