[jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

ASF GitHub Bot (JIRA) Tue, 28 Mar 2017 22:15:07 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946534#comment-15946534
 ]


ASF GitHub Bot commented on TIKA-2262:
--------------------------------------

GitHub user KranthiGV opened a pull request:

    https://github.com/apache/tika/pull/163

    Tika-2306: Update Inception v3 to Inception v4 in Object recognition parser 

    ## Summary:
    Object Recognition Parser currently uses Inception V3 model for the object 
classification task. Google released a newer Inception V4 model [1][2].
    It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
    
    ## Quick setup and Test:
    - Install tensor flow using pip 
-[https://www.tensorflow.org/install/](https://www.tensorflow.org/install/)
    - Install TF-slim 
    
    ```
    git clone https://github.com/tensorflow/models/   
    export PYTHONPATH="$PYTHONPATH:/models/slim" (replace with your 
installation directory)   
    
    sudo apt-get install libtcmalloc-minimal4   
    export LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4"   
    ```
    - NOTE: The last two lines are added due to tensorflow issues 
[](https://github.com/tensorflow/tensorflow/issues/6968). It would be removed 
once it is fixed.
    - It can be evaded by integrating parts of tensorflow/models code into our 
repository. It has Apache license. So, it can be done.
     
    - Checkout the test case 
`tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml`
    
    ## Demos:
    `java -jar tika-app/target/tika-app-1.15-SNAPSHOT.jar 
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
 tika-parsers/src/test/resources/test-documents/testJPEG.jpg
    `
    - The output would include:
    ```
    <meta name="OBJECT" content="Egyptian cat (0.31143)"/>  
    <meta name="OBJECT" content="tabby, tabby cat (0.07072)"/>  
    ```
    - NOTE: Only jpeg format is supported. I would work on other format support 
during GSoC 
([https://issues.apache.org/jira/browse/TIKA-2262](https://issues.apache.org/jira/browse/TIKA-2262)).
    
    # REST API
    ## Start the inception service on 8764 port : 
    The API service code is added at 
`tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py`
    
    Also, a docker file is added to setup the environment quickly
    
    ```
    cd tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/
    docker build -f InceptionRestDockerfile -t inception-rest-tika .
    docker run -p 8764:8764 -it inception-rest-tika
    ```
    - Use the config at 
`tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml`
    
    # Tests and build
    ## Build status
    ```
    [INFO] Reactor Summary:
    [INFO] 
    [INFO] Apache Tika parent ................................ SUCCESS [0.693s]
    [INFO] Apache Tika core .................................. SUCCESS [19.393s]
    [INFO] Apache Tika parsers ............................... SUCCESS 
[1:02.685s]
    [INFO] Apache Tika XMP ................................... SUCCESS [0.851s]
    [INFO] Apache Tika serialization ......................... SUCCESS [0.924s]
    [INFO] Apache Tika batch ................................. SUCCESS 
[1:53.792s]
    [INFO] Apache Tika language detection .................... SUCCESS [2.210s]
    [INFO] Apache Tika application ........................... SUCCESS [23.620s]
    [INFO] Apache Tika OSGi bundle ........................... SUCCESS [11.271s]
    [INFO] Apache Tika translate ............................. SUCCESS [1.161s]
    [INFO] Apache Tika server ................................ SUCCESS [26.655s]
    [INFO] Apache Tika examples .............................. SUCCESS [3.562s]
    [INFO] Apache Tika Java-7 Components ..................... SUCCESS [1.040s]
    [INFO] Apache Tika eval .................................. SUCCESS [13.477s]
    [INFO] Apache Tika ....................................... SUCCESS [0.037s]
    [INFO] 
------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] 
------------------------------------------------------------------------
    [INFO] Total time: 4:41.751s
    [INFO] Finished at: Wed Mar 29 10:38:00 IST 2017
    [INFO] Final Memory: 158M/1535M
    [INFO] 
------------------------------------------------------------------------
    ```
    ## Script based implementation tests
    ```
    timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar 
tika-app/target/tika-app-1.15-SNAPSHOT.jar 
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
 tika-parsers/src/test/resources/test-documents/testJPEG.jpg
    WARN  JBIG2ImageReader not loaded. jbig2 files will be ignored
    INFO  minConfidence = 0.015, topN=2
    INFO  Recogniser = 
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
    INFO  Recogniser Available = true
    <?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="Egyptian cat" content="0.31143"/>
    <meta name="tabby, tabby cat" content="0.07072"/>
    <meta name="tiger cat" content="0.04990"/>
    <meta name="Siamese cat, Siamese" content="0.02097"/>
    <meta name="Border collie" content="0.01930"/>
    <title/>
    </head>
    <body><p/>
    </body></html><html xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="org.apache.tika.parser.recognition.object.rec.impl" 
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
    <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
    <meta name="X-Parsed-By" 
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
    <meta name="resourceName" content="testJPEG.jpg"/>
    <meta name="Content-Length" content="7686"/>
    <meta name="OBJECT" content="Egyptian cat (0.31143)"/>
    <meta name="OBJECT" content="tabby, tabby cat (0.07072)"/>
    <meta name="Content-Type" content="image/jpeg"/>
    <title/>
    </head>
    <body><ol id="objects">     <li id="Egyptian cat"> Egyptian cat 
[eng](confidence = 0.311430 )</li>
        <li id="tabby, tabby cat"> tabby, tabby cat [eng](confidence = 0.070720 
)</li>
    </ol>
    
    timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar 
tika-app/p-1.15-SNAPSHOT.jar  
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
 
https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/US_Navy_100714-N-4965F-174_Chief_Mass_Communication_Specialist_Paula_Ludwick%2C_assigned_to_Fleet_Combat_Camera_Group_Pacific%2C_shoots_at_a_target_during_a_Navy_Rifle_Qualification_Course.jpg/220px-thumbnail.jpg
    WARN  JBIG2ImageReader not loaded. jbig2 files will be ignored
    INFO  minConfidence = 0.015, topN=2
    INFO  Recogniser = 
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
    INFO  Recogniser Available = true
    <?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="military uniform" content="0.00355"/>
    <meta name="bulletproof vest" content="0.00397"/>
    <meta name="revolver, six-gun, six-shooter" content="0.00176"/>
    <meta name="assault rifle, assault gun" content="0.84119"/>
    <meta name="rifle" content="0.08642"/>
    <title/>
    </head>
    <body><p/>
    </body></html><html xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="org.apache.tika.parser.recognition.object.rec.impl" 
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
    <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
    <meta name="X-Parsed-By" 
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
    <meta name="resourceName" content="220px-thumbnail.jpg"/>
    <meta name="Content-Length" content="9535"/>
    <meta name="OBJECT" content="assault rifle, assault gun (0.84119)"/>
    <meta name="OBJECT" content="rifle (0.08642)"/>
    <meta name="Content-Type" content="image/jpeg"/>
    <title/>
    </head>
    <body><ol id="objects">     <li id="assault rifle, assault gun"> assault 
rifle, assault gun [eng](confidence = 0.841190 )</li>
        <li id="rifle"> rifle [eng](confidence = 0.086420 )</li>
    </ol>
    </body></html>
    
    timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar 
tika-app/p-1.15-SNAPSHOT.jar  
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
 https://upload.wikimedia.org/wikipedia/commons/8/8d/Glock17.jpg
    WARN  JBIG2ImageReader not loaded. jbig2 files will be ignored
    INFO  minConfidence = 0.015, topN=2
    INFO  Recogniser = 
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
    INFO  Recogniser Available = true
    <?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="hatchet" content="0.00087"/>
    <meta name="revolver, six-gun, six-shooter" content="0.89842"/>
    <meta name="holster" content="0.02361"/>
    <meta name="assault rifle, assault gun" content="0.01820"/>
    <meta name="rifle" content="0.00943"/>
    <title/>
    </head>
    <body><p/>
    </body></html><html xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="org.apache.tika.parser.recognition.object.rec.impl" 
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
    <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
    <meta name="X-Parsed-By" 
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
    <meta name="resourceName" content="Glock17.jpg"/>
    <meta name="Content-Length" content="368025"/>
    <meta name="OBJECT" content="revolver, six-gun, six-shooter (0.89842)"/>
    <meta name="OBJECT" content="holster (0.02361)"/>
    <meta name="Content-Type" content="image/jpeg"/>
    <title/>
    </head>
    <body><ol id="objects">     <li id="revolver, six-gun, six-shooter"> 
revolver, six-gun, six-shooter [eng](confidence = 0.898420 )</li>
        <li id="holster"> holster [eng](confidence = 0.023610 )</li>
    </ol>
    </body></html>
    
    timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar 
tika-app/p-1.15-SNAPSHOT.jar  
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
 
http://www.trbimg.com/img-57226a08/turbine/ct-tesla-model-3-unveiling-20160404/650/650x366
    WARN  JBIG2ImageReader not loaded. jbig2 files will be ignored
    INFO  minConfidence = 0.015, topN=2
    INFO  Recogniser = 
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
    INFO  Recogniser Available = true
    <?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="car wheel" content="0.07900"/>
    <meta name="convertible" content="0.04067"/>
    <meta name="sports car, sport car" content="0.75443"/>
    <meta name="beach wagon, station wagon, wagon, estate car, beach waggon, 
station waggon, waggon" content="0.00955"/>
    <meta name="grille, radiator grille" content="0.01366"/>
    <title/>
    </head>
    <body><p/>
    </body></html><html xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="org.apache.tika.parser.recognition.object.rec.impl" 
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
    <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
    <meta name="X-Parsed-By" 
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
    <meta name="resourceName" content="650x366"/>
    <meta name="Content-Length" content="42377"/>
    <meta name="OBJECT" content="sports car, sport car (0.75443)"/>
    <meta name="OBJECT" content="car wheel (0.07900)"/>
    <meta name="Content-Type" content="image/jpeg"/>
    <title/>
    </head>
    <body><ol id="objects">     <li id="sports car, sport car"> sports car, 
sport car [eng](confidence = 0.754430 )</li>
        <li id="car wheel"> car wheel [eng](confidence = 0.079000 )</li>
    </ol>
    </body></html>
    ```
    ## REST API based implementation tests
    ```
    timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar 
tika-app/target/tika-app-1.15-SNAPSHOT.jar  
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
 
http://www.trbimg.com/img-57226a08/turbine/ct-tesla-model-3-unveiling-20160404/650/650x366
    WARN  JBIG2ImageReader not loaded. jbig2 files will be ignored
    INFO  Available = true, API Status = HTTP/1.0 200 OK
    INFO  minConfidence = 0.015, topN=2
    INFO  Recogniser = 
org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
    INFO  Recogniser Available = true
    <?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
    <head>
    <meta name="org.apache.tika.parser.recognition.object.rec.impl" 
content="org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser"/>
    <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
    <meta name="X-Parsed-By" 
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
    <meta name="resourceName" content="650x366"/>
    <meta name="Content-Length" content="42377"/>
    <meta name="OBJECT" content="sports car, sport car (0.75443)"/>
    <meta name="OBJECT" content="car wheel (0.07900)"/>
    <meta name="Content-Type" content="image/jpeg"/>
    <title/>
    </head>
    <body><ol id="objects">     <li id="sports car, sport car"> sports car, 
sport car [en](confidence = 0.754434 )</li>
        <li id="car wheel"> car wheel [en](confidence = 0.079000 )</li>
    </ol>
    </body></html>
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KranthiGV/tika TIKA-2306

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #163
    
----
commit 236db96393d94756dbc2e3f40b318f8f93b95dff
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T20:22:02Z

    fix for TIKA-2306 contributed by kranthigv

commit 0c0bd4bec2312355d2bc48426f8ec94306d0e4a0
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T20:28:52Z

    fix for TIKA-2306 contributed by kranthigv

commit cb8f8f5e7ea2b4e13853e6dfc2165127521d9c64
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T20:51:36Z

    fix the image

commit c7f27b561ac1a44a35d3f7fd7881daf5dae8b835
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T22:17:59Z

    inceptionapi.py file added for REST API feature

commit 1fc82e84cc27f60cc64c7844e36bdab2d3c85e7c
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T22:43:41Z

    fix the destination directory

commit 900e4cfff9c5036bceba6a2f6cda1a9c942d3fa7
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T23:04:06Z

    fix no variables to save

commit 0341a5d25dececf799746d6906963496a5256f11
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T23:17:02Z

    unexpected argument

commit b9f496c68b27e64f1eddca212db88e3444051cc5
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T23:19:59Z

    undefined variable

commit f8c51bab139f0b7c8d9ea070ae40c87bbaf87689
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T23:25:10Z

    undefined variable

commit d199692b650edaaf743ca6cfc5c34954baf8831d
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-26T23:29:28Z

    undefined variable

commit 0eedec8c62cf5e6ddee4f14ca4b4fa59d2930be5
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-27T02:07:19Z

    Working inceptionapi.py without comments

commit 09cb2df973f20e3a877ca1309b67384264650be0
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-29T03:51:31Z

    fix for TIKA-2306 contributed by kranthigv

commit f92809ac19d5bef903ef1ac393092e6a13884fc0
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-29T03:55:01Z

    fix for TIKA-2306 contributed by kranthigv

commit be773cacaf3c344c11fff9b85ebaf1d0dc8b5174
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-29T04:11:48Z

    fix for TIKA-2306 contributed by kranthigv

commit 75a2ae12d170fc99b4bf9ab266c6169859c23dda
Author: Kranthi Kiran GV <[email protected]>
Date:   2017-03-29T05:09:22Z

    Changed models repo to a forked repo for future compatibility

----


> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2262
>                 URL: https://issues.apache.org/jira/browse/TIKA-2262
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Thamme Gowda
>              Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the 
> metadata of images to provide a brief summary of the scenery in the image. 
> It is a challenging and interesting problem in the domain of computer vision. 
> Tika already has a support for image recognition via [Object Recognition 
> Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which 
> uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
> Captioning an image is a very useful feature since it helps text based 
> Information Retrieval(IR) systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and 
> its model for autogenerating captions. [Source Code| 
> https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| 
> https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this| 
> https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs 
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to 
> deeplearning4j|https://deeplearning4j.org/model-import-keras ] and run them 
> natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to 
> have lots of troubles on the way! 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

Reply via email to