Hi all, I just wanted to repost the following by Chris Mattman on the TIKA list:
If you have been following the news you’ve seen the Panama papers and how the world’s rich and elite have been storing all their money offshore to hide it. Two of the ASF’s key technologies were used in uncovering that story and showing the world what was going on: Apache Tika and Apache Solr. Solr was used for making the Terabytes of Panama Papers available to journalists. The preprocessing of the documents for indexing was done with Tika (maybe through the contrib/extraction module). Here is the article by Forbes about that: http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak Uwe ----- Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/