This is very helpful. Thank you! Is there any use in having the tika-dl module if our more modern approach is REST + Docker? The upkeep in tika-dl is nontrivial.
On Fri, Jul 6, 2018 at 6:15 PM Chris Mattmann <[email protected]> wrote: > Tim, > > > > Thanks. There are multiple modes of integrating deep learning with Tika: > > > The original mode: uses Thamme’s work on REST exposing Tensorflow > and Docker to provide a REST Service to Tika to allow for running > Tensorflow > DL models. We initially did Inception_v3, and a model by Madhav Sharan > that combines OpenCV > with Inception v3 (and a new docker that installs OpenCV it’s a pain) for > image > and video object recognition, respectively. See: > https://github.com/apache/tika/pull/208 > and https://github.com/apache/tika/pull/168 and also the wiki > Later, Thamme, Avtar Singh, KranthiGV, added DL4J support: > https://github.com/apache/tika/pull/165 > including Inceptionv3 and VGG16 - https://github.com/apache/tika/pull/182 > This houses the model in USC Data science repo and uses it as an example > for how to store and load models from Keras/Python into DL4j: > > https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example/data > Then, Thejan added Text Captioning and a new Docker, and trained model: > https://github.com/apache/tika/pull/180 > Then Raunaq from UPenn added Inception v4 support via the > Docker/Tensorflow way: > https://github.com/apache/tika/pull/162 > All this Docker work caused Thejan and others to think we needed to > refactor the dockers. We did > that here: https://github.com/apache/tika/pull/208 to make them cleaner, > and to depend on: > http://github.com/USCDataScience/tika-dockers/ and on > http://github.com/USCDataScience/img2text > models for image captioning. Now, Video and Image recognition and Image > Captioning all had the same > base docker and sub dockers from that. > > > That’s where we’re at today. Make sense? ☺ Thejan and others want to add > more DL4J supported models > and we can always use Tensorflow/Docker as well as a way of doing it. > > > > Cheers, > > Chris > > > > > > > > > > From: Tim Allison <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Friday, July 6, 2018 at 2:39 PM > To: "[email protected]" <[email protected]> > Subject: image recognition...how do the parts play together? > > > > On Twitter, Chris, Thamme, Thejan, and I are working with some > > deeplearning4j devs to help us upgrade to deeplearning4j 1.0.0-BETA > > (TIKA-2672). > > > > I initially requested help from Thejan (and Thamme :D) for this because we > > were getting an initialization exception after the upgrade in tika-dl's > > DL4JInceptionV3Net. > > > > According to our wiki[2], we upgraded to InceptionV4 in Tika-2306 by adding > > the TensorFlowRESTRecogniser...does this mean we can get rid of > > DL4JInceptionV3Net? Or, what are we actually asking the dl4j folks to help > > with? > > > > How do these recognizers play together? > > > > Thank you. > > > > Cheers, > > > > Tim > > > > [1] e.g. https://twitter.com/chrismattmann/status/1015340483923439617 > > [2] https://wiki.apache.org/tika/TikaAndVision > > > >
