[jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

ASF GitHub Bot (JIRA) Sat, 08 Jul 2017 23:10:23 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079470#comment-16079470
 ]


ASF GitHub Bot commented on TIKA-2262:
--------------------------------------

chrismattmann commented on issue #189: Fix for TIKA-2262: Supporting 
Image-to-Text (Image Captioning) in Tika
URL: https://github.com/apache/tika/pull/189#issuecomment-313901418
 
 
   OK I was able to test this out. I fixed the unit tests to properly use 
junit.Assume. I also fixed a Locale issue with forbidden APIs. Got it working 
both unit test wise, and actually via java, both are shown below. I will commit 
this to 1.17-master tomorrow. GREAT WORK @ThejanW and @thammegowda 
   
   ## Unit tests
   
   ```
   [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ tika-parsers 
---
   [INFO] Surefire report directory: 
/Users/mattmann/src/tika/tika-parsers/target/surefire-reports
   
   -------------------------------------------------------
    T E S T S
   -------------------------------------------------------
   Running org.apache.tika.parser.recognition.ObjectRecognitionParserTest
   log4j: reset attribute= "false".
   log4j: Threshold ="null".
   log4j: Retreiving an instance of org.apache.log4j.Logger.
   log4j: Setting [ProgressAppender] additivity to [false].
   log4j: Level value for ProgressAppender is  [INFO].
   log4j: ProgressAppender level set to INFO
   log4j: Class name: [org.apache.log4j.ConsoleAppender]
   log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
   log4j: Setting property [conversionPattern] to [%m].
   log4j: Adding appender named [noEolAppender] to category [ProgressAppender].
   log4j: Retreiving an instance of org.apache.log4j.Logger.
   log4j: Setting [ProgressDone] additivity to [false].
   log4j: Level value for ProgressDone is  [INFO].
   log4j: ProgressDone level set to INFO
   log4j: Class name: [org.apache.log4j.ConsoleAppender]
   log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
   log4j: Setting property [conversionPattern] to [%m%n].
   log4j: Adding appender named [eolAppender] to category [ProgressDone].
   log4j: Level value for root is  [INFO].
   log4j: root level set to INFO
   log4j: Class name: [org.apache.log4j.ConsoleAppender]
   log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
   log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p 
%c{1} - %m%n].
   log4j: Adding appender named [consoleAppender] to category [root].
   09 Jul 2017 06:03:59  INFO TensorflowRESTCaptioner - Available = true, API 
Status = HTTP/1.0 200 OK
   09 Jul 2017 06:03:59  INFO TensorflowRESTCaptioner - Captions = 5, 
MaxCaptionLength = 15
   09 Jul 2017 06:03:59  INFO ObjectRecognitionParser - Recogniser = 
org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
   09 Jul 2017 06:03:59  INFO ObjectRecognitionParser - Recogniser Available = 
true
   09 Jul 2017 06:03:59  INFO ObjectRecognitionParser - minConfidence = 0.05, 
topN=2
   09 Jul 2017 06:04:01  INFO ObjectRecognitionParser - Time taken 1477ms
   09 Jul 2017 06:04:01  INFO ObjectRecognitionParserTest - MetaValues = a 
baseball player holding a bat on a field . (0.00168) a baseball player holding 
a baseball bat on a field . (0.00163) a man in a baseball uniform throwing a 
baseball . (0.00090) a man in a baseball uniform holding a bat . (0.00068) a 
man in a baseball uniform holding a baseball bat . (0.00042)
   09 Jul 2017 06:04:01  INFO TensorflowRESTCaptioner - Available = true, API 
Status = HTTP/1.0 200 OK
   09 Jul 2017 06:04:01  INFO TensorflowRESTCaptioner - Captions = 5, 
MaxCaptionLength = 15
   09 Jul 2017 06:04:01  INFO ObjectRecognitionParser - Recogniser = 
org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
   09 Jul 2017 06:04:01  INFO ObjectRecognitionParser - Recogniser Available = 
true
   09 Jul 2017 06:04:01  INFO ObjectRecognitionParser - minConfidence = 0.05, 
topN=2
   09 Jul 2017 06:04:02  INFO ObjectRecognitionParser - Time taken 1389ms
   09 Jul 2017 06:04:02  INFO ObjectRecognitionParserTest - MetaValues = a 
baseball player pitching a ball on top of a field . (0.00367) a baseball player 
holding a bat on top of a field . (0.00272) a baseball player swinging a bat at 
a ball (0.00220) a baseball player holding a bat on a field . (0.00193) a 
baseball player is getting ready to hit a ball . (0.00146)
   09 Jul 2017 06:04:02  INFO TensorflowRESTCaptioner - Available = true, API 
Status = HTTP/1.0 200 OK
   09 Jul 2017 06:04:02  INFO TensorflowRESTCaptioner - Captions = 5, 
MaxCaptionLength = 15
   09 Jul 2017 06:04:02  INFO ObjectRecognitionParser - Recogniser = 
org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
   09 Jul 2017 06:04:02  INFO ObjectRecognitionParser - Recogniser Available = 
true
   09 Jul 2017 06:04:02  INFO ObjectRecognitionParser - minConfidence = 0.05, 
topN=2
   09 Jul 2017 06:04:03  INFO ObjectRecognitionParser - Time taken 1500ms
   09 Jul 2017 06:04:03  INFO ObjectRecognitionParserTest - MetaValues = a 
baseball player holding a bat on a field . (0.00150) a baseball player holding 
a baseball bat on a field . (0.00138) a baseball player holding a bat on top of 
a field . (0.00090) a man in a baseball uniform holding a bat . (0.00063) a man 
in a baseball uniform holding a baseball bat . (0.00037)
   Tests run: 5, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 5.384 sec - 
in org.apache.tika.parser.recognition.ObjectRecognitionParserTest
   
   Results :
   
   Tests run: 5, Failures: 0, Errors: 0, Skipped: 2
   
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time: 9.027 s
   [INFO] Finished at: 2017-07-08T23:04:03-07:00
   [INFO] Final Memory: 33M/968M
   [INFO] 
------------------------------------------------------------------------
   LMC-053601:tika-parsers mattmann$ 
   ```
   
   ## Java
   
   ```
   LMC-053601:tika mattmann$ java -jar 
tika-app/target/tika-app-1.17-SNAPSHOT.jar 
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-im2txt-rest.xml
 
https://upload.wikimedia.org/wikipedia/commons/f/f6/Working_Dogs%2C_Handlers_Share_Special_Bond_DVIDS124942.jpg
   INFO  Available = true, API Status = HTTP/1.0 200 OK
   INFO  Captions = 5, MaxCaptionLength = 15
   INFO  Recogniser = 
org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
   INFO  Recogniser Available = true
   INFO  minConfidence = 0.05, topN=2
   INFO  Time taken 1779ms
   <?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
   <head>
   <meta name="org.apache.tika.parser.recognition.object.rec.impl" 
content="org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner"/>
   <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
   <meta name="X-Parsed-By" 
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
   <meta name="resourceName" 
content="Working_Dogs%2C_Handlers_Share_Special_Bond_DVIDS124942.jpg"/>
   <meta name="Content-Length" content="295937"/>
   <meta name="CAPTION" content="a man standing next to a dog on a leash . 
(0.00017)"/>
   <meta name="CAPTION" content="a man standing next to a dog on a bench . 
(0.00017)"/>
   <meta name="CAPTION" content="a man and a dog are sitting on a bench . 
(0.00014)"/>
   <meta name="CAPTION" content="a man and a dog sitting on a bench . 
(0.00013)"/>
   <meta name="CAPTION" content="a man and a dog are sitting on a bench 
(0.00009)"/>
   <meta name="Content-Type" content="image/jpeg"/>
   <title/>
   </head>
   <body><ol id="captions">     <li id="0"> a man standing next to a dog on a 
leash . [en](confidence = 0.000167)</li>
        <li id="1"> a man standing next to a dog on a bench . [en](confidence = 
0.000167)</li>
        <li id="2"> a man and a dog are sitting on a bench . [en](confidence = 
0.000138)</li>
        <li id="3"> a man and a dog sitting on a bench . [en](confidence = 
0.000131)</li>
        <li id="4"> a man and a dog are sitting on a bench [en](confidence = 
0.000092)</li>
   </ol>
   </body></html>LMC-053601:tika mattmann$ 
   ```
   
   Documentation is here: https://wiki.apache.org/tika/ImageCaption
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2262
>                 URL: https://issues.apache.org/jira/browse/TIKA-2262
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Thamme Gowda
>            Assignee: Thamme Gowda
>              Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the 
> metadata of images to provide a brief summary of the scenery in the image. 
> It is a challenging and interesting problem in the domain of computer vision. 
> Tika already has a support for image recognition via [Object Recognition 
> Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which 
> uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
> Captioning an image is a very useful feature since it helps text based 
> Information Retrieval(IR) systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and 
> its model for autogenerating captions. [Source Code| 
> https://github.com/tensorflow/models/tree/master/im2txt], [Research blog| 
> https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this| 
> https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server] 
> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs 
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to 
> deeplearning4j|https://deeplearning4j.org/model-import-keras ] and run them 
> natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to 
> have lots of troubles on the way! 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

Reply via email to