Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "ImageCaption" page has been changed by ThejanW: https://wiki.apache.org/tika/ImageCaption?action=diff&rev1=2&rev2=3 <<TableOfContents(4)>> - This page describes how to make use of Image Captioning capability of Apache Tika. "Image captioning" or "describing the content of an image" is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. TIKA-2262 introduced a new parser to perform captioning on images. Visit [[https://issues.apache.org/jira/browse/TIKA-2262 | TIKA-2262 issue on Jira ]] or [[ https://github.com/apache/tika/pull/180 | pull request on Github]] to read the related conversation. Currently, Tika utilizes an implementation based on the paper [[https://arxiv.org/abs/1411.4555|Show and Tell: A Neural Image Caption Generator]] for captioning images. This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation that can be used to generate natural sentences describing an image. Continue reading to get Tika up and running for image captioning. + This page describes how to use the Image Captioning capability of Apache Tika. "Image captioning" or "describing the content of an image" is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. TIKA-2262 introduced a new parser to perform captioning on images. Visit [[https://issues.apache.org/jira/browse/TIKA-2262 | TIKA-2262 issue on Jira ]] or [[ https://github.com/apache/tika/pull/180 | pull request on Github]] to see the related conversations. Currently, Tika utilizes an implementation based on the paper [[https://arxiv.org/abs/1411.4555|Show and Tell: A Neural Image Caption Generator]] for captioning images. This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation that can be used to generate natural sentences describing an image. Continue reading to get Tika up and running for image captioning. - == Tika and Tensorflow Image Recognition == + == Tika and Tensorflow Image Captioning Using REST Server == + We are going to start a python flask based REST API server and tell tika to connect to it. All the dependencies and setup complexities are isolated in the docker image. - Tika has two different ways of bindings to Tensorflow: - 1. Using Commandline Invocation -- Recommended for quick testing, not for production use - 2. Using REST API -- Recommended for production use - - === 1. Tensorflow Using Commandline Invocation === - '''Pros of this approach:''' - This parser is easy to setup and test - '''Cons:''' - Very inefficient/slow as it loads and unloads model for every parse call - - - ==== Step 1. Install the dependencies ==== - To install tensorflow, follow the instructions on [[https://www.tensorflow.org/install/|the official site here]] for your environment. - Unless you know what you are doing, you are recommended to follow pip installation. - - Then clone the repository [[https://github.com/tensorflow/models|tensorflow/models]] or download the [[https://github.com/tensorflow/models/archive/master.zip|zip file]]. - {{{git clone https://github.com/tensorflow/models.git}}} - - Add 'models/slim' folder to the environment variable, PYTHONPATH. - - {{{$ export PYTHONPATH="$PYTHONPATH:/path/to/models/slim"}}} - - To test the readiness of your environment : - - {{{$ python -c 'import tensorflow, numpy, datasets; print("OK")'}}} - - If the above command prints the message "OK", then the requirements are satisfied. - - ==== Step 2. Create a Tika-Config XML to enable Tensorflow parser. ==== - A sample config can be found in Tika source code at [[https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml|tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml]] - - '''Here is an example:''' - {{{#!highlight xml - <properties> - <parsers> - <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser"> - <mime>image/jpeg</mime> - <params> - <param name="topN" type="int">2</param> - <param name="minConfidence" type="double">0.015</param> - <param name="class" type="string">org.apache.tika.parser.recognition.tf.TensorflowImageRecParser</param> - </params> - </parser> - </parsers> - </properties> - }}} - - '''Description of parameters :''' - {{{#!csv - Param Name, Type, Meaning, Range, Example - topN, int, Number of object names to output, a non-zero positive integer, 1 to receive top 1 object name - minConfidence, double, Minimum confidence required to output the name of detected objects, [0.0 to 1.0] inclusive, 0.9 for outputting object names iff at least 90% confident - class, string, Class that implements object recognition functionality, constant string, org.apache.tika.parser.recognition.tf.TensorflowImageRecParser - }}} - - - ==== Step 3: Demo ==== - To use the vision capability via Tensorflow, just supply the above configuration to Tika. - - - For example, to use in Tika App (Assuming you have ''tika-app'' JAR and it is ready to run): - - {{{#!bash - $ java -jar tika-app/target/tika-app-1.15-SNAPSHOT.jar \ - --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml \ - https://upload.wikimedia.org/wikipedia/commons/f/f6/Working_Dogs%2C_Handlers_Share_Special_Bond_DVIDS124942.jpg - }}} - - The input image is: - - {{https://upload.wikimedia.org/wikipedia/commons/f/f6/Working_Dogs%2C_Handlers_Share_Special_Bond_DVIDS124942.jpg|Germal Shepherd with Military}} - - And, the top 2 detections are: - {{{#!highlight xml - ... - <meta name="OBJECT" content="German shepherd, German shepherd dog, German police dog, alsatian (0.78435)"/> - <meta name="OBJECT" content="military uniform (0.06694)"/> - ... - }}} - - - === 2. Tensorflow Using REST Server === - This is the recommended way for utilizing visual recognition capability of Tika. - This approach uses Tensorflow over REST API. - To get this working, we are going to start a python flask based REST API server and tell tika to connect to it. - All these dependencies and setup complexities are isolated in docker image. Requirements : Docker -- Visit [[https://www.docker.com/| Docker.com]] and install latest version of Docker. (Note: tested on docker v17.03.1) ==== Step 1. Setup REST Server ==== - You can either start the REST server in an isolated docker container or natively on the host that runs tensorflow. + You can either start the REST server in an isolated docker container or natively on the host that runs tensorflow v1.0 ===== a. Using docker (Recommended) ===== {{{#!highlight bash - cd tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/ + cd tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/ - # alternatively, if you do not have tika's source code, you may simply wget the 'InceptionRestDockerfile' from github link + # alternatively, if you do not have tika's source code, you may simply wget the 'Im2txtRestDockerfile' from github link - docker build -f InceptionRestDockerfile -t inception-rest-tika . + docker build -f Im2txtRestDockerfile -t im2txt-rest-tika . - docker run -p 8764:8764 -it inception-rest-tika + docker run -p 8764:8764 -it im2txt-rest-tika }}} - Once it is done, test the setup by visiting [[http://localhost:8764/inception/v4/classify?topk=2&url=https://upload.wikimedia.org/wikipedia/commons/f/f6/Working_Dogs%2C_Handlers_Share_Special_Bond_DVIDS124942.jpg]] in your web browser. + Once it is done, test the setup by visiting [[http://localhost:8764/inception/v3/captions?beam_size=3&max_caption_length=15&url=https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/Marcus_Thames_Tigers_2007.jpg/1200px-Marcus_Thames_Tigers_2007.jpg]] in your web browser. '''Sample output from API:''' {{{#!json
