Sergey,

Great to hear the code works well with the new modules! And I do agree that Tika has a number of application specific usecases that can be explored. I think the other goal is making the upgrade paths easier so developers don't have to drag "JAR Hell" with them into their projects. It was good to see in your commit you got to remove the maven exclusions as well. I think you can also remove the explicit tika-core entry as that should be a transitive dependency of any of the modules. This type of working is a huge help in moving towards the 2.0 release.

I think the next steps should be determining if there are any other breaking features we'd like to include in 2.0 and perhaps we can get Tim to run the 2.x branch through his massive regression test :).


- Bob



On 5/16/2016 10:21 AM, Sergey Beryozkin wrote:
Hi All

Hope this message will be more relevant compared to the one I posted after a social event at Apache Con NA 2016 :-). I had a chance to talk to Nick and Bob the next day and we agreed it would be good to have Tika 2.0-SNAPSHOT tested a bit more. Specifically I committed to updating a Tika-based demo we ship in Apache CXF to use 2.0-SNAPSHOT module dependencies - no pressure is expected on CXF master in the short term given that the master release won't happen in the next few months for sure.

FYI, in CXF we ship this demo:

https://github.com/apache/cxf/tree/master/distribution/src/main/release/samples/jax_rs/search

IMHO it is a very cool demo written by my CXF colleague Andriy Redko. This demo was part of his NA 2015 presentation:

http://events.linuxfoundation.org/sites/events/files/slides/Apache%20CXF%2C%20Tika%20and%20Lucene.pdf

Here is a demo description: a user can upload PDF or ODT files to a JAX-RS service using an HTML form. The uploaded files are submitted to a CXF Tika extensions:

https://github.com/apache/cxf/tree/master/rt/rs/extensions/search/src/main/java/org/apache/cxf/jaxrs/ext/search/tika

with this code:

https://github.com/apache/cxf/blob/master/distribution/src/main/release/samples/jax_rs/search/src/main/java/demo/jaxrs/search/server/Catalog.java#L115

where the Tika reported content/metadata is saved with Lucene.


Next a user enter a search phrase and finds matching documents, with the links to them being reported so that a user can download it.

IMHO it is an interesting demo because it shows how Tika can help in some application specific situations...

Finally, to the actual experiment I did today. Updating the demo to use individual parser modules was easy:

http://git-wip-us.apache.org/repos/asf/cxf/commit/c2ccecb2

All works well, better modularization in 2.0 will be welcomed

Thanks, Sergey







Reply via email to