Hi All
Hope this message will be more relevant compared to the one I posted
after a social event at Apache Con NA 2016 :-). I had a chance to talk
to Nick and Bob the next day and we agreed it would be good to have Tika
2.0-SNAPSHOT tested a bit more. Specifically I committed to updating a
Tika-based demo we ship in Apache CXF to use 2.0-SNAPSHOT module
dependencies - no pressure is expected on CXF master in the short term
given that the master release won't happen in the next few months for sure.
FYI, in CXF we ship this demo:
https://github.com/apache/cxf/tree/master/distribution/src/main/release/samples/jax_rs/search
IMHO it is a very cool demo written by my CXF colleague Andriy Redko.
This demo was part of his NA 2015 presentation:
http://events.linuxfoundation.org/sites/events/files/slides/Apache%20CXF%2C%20Tika%20and%20Lucene.pdf
Here is a demo description: a user can upload PDF or ODT files to a
JAX-RS service using an HTML form. The uploaded files are submitted to a
CXF Tika extensions:
https://github.com/apache/cxf/tree/master/rt/rs/extensions/search/src/main/java/org/apache/cxf/jaxrs/ext/search/tika
with this code:
https://github.com/apache/cxf/blob/master/distribution/src/main/release/samples/jax_rs/search/src/main/java/demo/jaxrs/search/server/Catalog.java#L115
where the Tika reported content/metadata is saved with Lucene.
Next a user enter a search phrase and finds matching documents, with the
links to them being reported so that a user can download it.
IMHO it is an interesting demo because it shows how Tika can help in
some application specific situations...
Finally, to the actual experiment I did today. Updating the demo to use
individual parser modules was easy:
http://git-wip-us.apache.org/repos/asf/cxf/commit/c2ccecb2
All works well, better modularization in 2.0 will be welcomed
Thanks, Sergey