[
https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946534#comment-15946534
]
ASF GitHub Bot commented on TIKA-2262:
--------------------------------------
GitHub user KranthiGV opened a pull request:
https://github.com/apache/tika/pull/163
Tika-2306: Update Inception v3 to Inception v4 in Object recognition parser
## Summary:
Object Recognition Parser currently uses Inception V3 model for the object
classification task. Google released a newer Inception V4 model [1][2].
It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
## Quick setup and Test:
- Install tensor flow using pip
-[https://www.tensorflow.org/install/](https://www.tensorflow.org/install/)
- Install TF-slim
```
git clone https://github.com/tensorflow/models/
export PYTHONPATH="$PYTHONPATH:/models/slim" (replace with your
installation directory)
sudo apt-get install libtcmalloc-minimal4
export LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4"
```
- NOTE: The last two lines are added due to tensorflow issues
[](https://github.com/tensorflow/tensorflow/issues/6968). It would be removed
once it is fixed.
- It can be evaded by integrating parts of tensorflow/models code into our
repository. It has Apache license. So, it can be done.
- Checkout the test case
`tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml`
## Demos:
`java -jar tika-app/target/tika-app-1.15-SNAPSHOT.jar
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
tika-parsers/src/test/resources/test-documents/testJPEG.jpg
`
- The output would include:
```
<meta name="OBJECT" content="Egyptian cat (0.31143)"/>
<meta name="OBJECT" content="tabby, tabby cat (0.07072)"/>
```
- NOTE: Only jpeg format is supported. I would work on other format support
during GSoC
([https://issues.apache.org/jira/browse/TIKA-2262](https://issues.apache.org/jira/browse/TIKA-2262)).
# REST API
## Start the inception service on 8764 port :
The API service code is added at
`tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py`
Also, a docker file is added to setup the environment quickly
```
cd tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/
docker build -f InceptionRestDockerfile -t inception-rest-tika .
docker run -p 8764:8764 -it inception-rest-tika
```
- Use the config at
`tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml`
# Tests and build
## Build status
```
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Tika parent ................................ SUCCESS [0.693s]
[INFO] Apache Tika core .................................. SUCCESS [19.393s]
[INFO] Apache Tika parsers ............................... SUCCESS
[1:02.685s]
[INFO] Apache Tika XMP ................................... SUCCESS [0.851s]
[INFO] Apache Tika serialization ......................... SUCCESS [0.924s]
[INFO] Apache Tika batch ................................. SUCCESS
[1:53.792s]
[INFO] Apache Tika language detection .................... SUCCESS [2.210s]
[INFO] Apache Tika application ........................... SUCCESS [23.620s]
[INFO] Apache Tika OSGi bundle ........................... SUCCESS [11.271s]
[INFO] Apache Tika translate ............................. SUCCESS [1.161s]
[INFO] Apache Tika server ................................ SUCCESS [26.655s]
[INFO] Apache Tika examples .............................. SUCCESS [3.562s]
[INFO] Apache Tika Java-7 Components ..................... SUCCESS [1.040s]
[INFO] Apache Tika eval .................................. SUCCESS [13.477s]
[INFO] Apache Tika ....................................... SUCCESS [0.037s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 4:41.751s
[INFO] Finished at: Wed Mar 29 10:38:00 IST 2017
[INFO] Final Memory: 158M/1535M
[INFO]
------------------------------------------------------------------------
```
## Script based implementation tests
```
timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar
tika-app/target/tika-app-1.15-SNAPSHOT.jar
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
tika-parsers/src/test/resources/test-documents/testJPEG.jpg
WARN JBIG2ImageReader not loaded. jbig2 files will be ignored
INFO minConfidence = 0.015, topN=2
INFO Recogniser =
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
INFO Recogniser Available = true
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Egyptian cat" content="0.31143"/>
<meta name="tabby, tabby cat" content="0.07072"/>
<meta name="tiger cat" content="0.04990"/>
<meta name="Siamese cat, Siamese" content="0.02097"/>
<meta name="Border collie" content="0.01930"/>
<title/>
</head>
<body><p/>
</body></html><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="org.apache.tika.parser.recognition.object.rec.impl"
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
<meta name="X-Parsed-By"
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
<meta name="resourceName" content="testJPEG.jpg"/>
<meta name="Content-Length" content="7686"/>
<meta name="OBJECT" content="Egyptian cat (0.31143)"/>
<meta name="OBJECT" content="tabby, tabby cat (0.07072)"/>
<meta name="Content-Type" content="image/jpeg"/>
<title/>
</head>
<body><ol id="objects"> <li id="Egyptian cat"> Egyptian cat
[eng](confidence = 0.311430 )</li>
<li id="tabby, tabby cat"> tabby, tabby cat [eng](confidence = 0.070720
)</li>
</ol>
timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar
tika-app/p-1.15-SNAPSHOT.jar
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/US_Navy_100714-N-4965F-174_Chief_Mass_Communication_Specialist_Paula_Ludwick%2C_assigned_to_Fleet_Combat_Camera_Group_Pacific%2C_shoots_at_a_target_during_a_Navy_Rifle_Qualification_Course.jpg/220px-thumbnail.jpg
WARN JBIG2ImageReader not loaded. jbig2 files will be ignored
INFO minConfidence = 0.015, topN=2
INFO Recogniser =
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
INFO Recogniser Available = true
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="military uniform" content="0.00355"/>
<meta name="bulletproof vest" content="0.00397"/>
<meta name="revolver, six-gun, six-shooter" content="0.00176"/>
<meta name="assault rifle, assault gun" content="0.84119"/>
<meta name="rifle" content="0.08642"/>
<title/>
</head>
<body><p/>
</body></html><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="org.apache.tika.parser.recognition.object.rec.impl"
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
<meta name="X-Parsed-By"
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
<meta name="resourceName" content="220px-thumbnail.jpg"/>
<meta name="Content-Length" content="9535"/>
<meta name="OBJECT" content="assault rifle, assault gun (0.84119)"/>
<meta name="OBJECT" content="rifle (0.08642)"/>
<meta name="Content-Type" content="image/jpeg"/>
<title/>
</head>
<body><ol id="objects"> <li id="assault rifle, assault gun"> assault
rifle, assault gun [eng](confidence = 0.841190 )</li>
<li id="rifle"> rifle [eng](confidence = 0.086420 )</li>
</ol>
</body></html>
timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar
tika-app/p-1.15-SNAPSHOT.jar
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
https://upload.wikimedia.org/wikipedia/commons/8/8d/Glock17.jpg
WARN JBIG2ImageReader not loaded. jbig2 files will be ignored
INFO minConfidence = 0.015, topN=2
INFO Recogniser =
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
INFO Recogniser Available = true
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="hatchet" content="0.00087"/>
<meta name="revolver, six-gun, six-shooter" content="0.89842"/>
<meta name="holster" content="0.02361"/>
<meta name="assault rifle, assault gun" content="0.01820"/>
<meta name="rifle" content="0.00943"/>
<title/>
</head>
<body><p/>
</body></html><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="org.apache.tika.parser.recognition.object.rec.impl"
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
<meta name="X-Parsed-By"
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
<meta name="resourceName" content="Glock17.jpg"/>
<meta name="Content-Length" content="368025"/>
<meta name="OBJECT" content="revolver, six-gun, six-shooter (0.89842)"/>
<meta name="OBJECT" content="holster (0.02361)"/>
<meta name="Content-Type" content="image/jpeg"/>
<title/>
</head>
<body><ol id="objects"> <li id="revolver, six-gun, six-shooter">
revolver, six-gun, six-shooter [eng](confidence = 0.898420 )</li>
<li id="holster"> holster [eng](confidence = 0.023610 )</li>
</ol>
</body></html>
timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar
tika-app/p-1.15-SNAPSHOT.jar
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow.xml
http://www.trbimg.com/img-57226a08/turbine/ct-tesla-model-3-unveiling-20160404/650/650x366
WARN JBIG2ImageReader not loaded. jbig2 files will be ignored
INFO minConfidence = 0.015, topN=2
INFO Recogniser =
org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
INFO Recogniser Available = true
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="car wheel" content="0.07900"/>
<meta name="convertible" content="0.04067"/>
<meta name="sports car, sport car" content="0.75443"/>
<meta name="beach wagon, station wagon, wagon, estate car, beach waggon,
station waggon, waggon" content="0.00955"/>
<meta name="grille, radiator grille" content="0.01366"/>
<title/>
</head>
<body><p/>
</body></html><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="org.apache.tika.parser.recognition.object.rec.impl"
content="org.apache.tika.parser.recognition.tf.TensorflowImageRecParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
<meta name="X-Parsed-By"
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
<meta name="resourceName" content="650x366"/>
<meta name="Content-Length" content="42377"/>
<meta name="OBJECT" content="sports car, sport car (0.75443)"/>
<meta name="OBJECT" content="car wheel (0.07900)"/>
<meta name="Content-Type" content="image/jpeg"/>
<title/>
</head>
<body><ol id="objects"> <li id="sports car, sport car"> sports car,
sport car [eng](confidence = 0.754430 )</li>
<li id="car wheel"> car wheel [eng](confidence = 0.079000 )</li>
</ol>
</body></html>
```
## REST API based implementation tests
```
timberners@galileo:~/Desktop/gsoc/issues/tika$ java -jar
tika-app/target/tika-app-1.15-SNAPSHOT.jar
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
http://www.trbimg.com/img-57226a08/turbine/ct-tesla-model-3-unveiling-20160404/650/650x366
WARN JBIG2ImageReader not loaded. jbig2 files will be ignored
INFO Available = true, API Status = HTTP/1.0 200 OK
INFO minConfidence = 0.015, topN=2
INFO Recogniser =
org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
INFO Recogniser Available = true
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="org.apache.tika.parser.recognition.object.rec.impl"
content="org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
<meta name="X-Parsed-By"
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
<meta name="resourceName" content="650x366"/>
<meta name="Content-Length" content="42377"/>
<meta name="OBJECT" content="sports car, sport car (0.75443)"/>
<meta name="OBJECT" content="car wheel (0.07900)"/>
<meta name="Content-Type" content="image/jpeg"/>
<title/>
</head>
<body><ol id="objects"> <li id="sports car, sport car"> sports car,
sport car [en](confidence = 0.754434 )</li>
<li id="car wheel"> car wheel [en](confidence = 0.079000 )</li>
</ol>
</body></html>
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/KranthiGV/tika TIKA-2306
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tika/pull/163.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #163
----
commit 236db96393d94756dbc2e3f40b318f8f93b95dff
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T20:22:02Z
fix for TIKA-2306 contributed by kranthigv
commit 0c0bd4bec2312355d2bc48426f8ec94306d0e4a0
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T20:28:52Z
fix for TIKA-2306 contributed by kranthigv
commit cb8f8f5e7ea2b4e13853e6dfc2165127521d9c64
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T20:51:36Z
fix the image
commit c7f27b561ac1a44a35d3f7fd7881daf5dae8b835
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T22:17:59Z
inceptionapi.py file added for REST API feature
commit 1fc82e84cc27f60cc64c7844e36bdab2d3c85e7c
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T22:43:41Z
fix the destination directory
commit 900e4cfff9c5036bceba6a2f6cda1a9c942d3fa7
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T23:04:06Z
fix no variables to save
commit 0341a5d25dececf799746d6906963496a5256f11
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T23:17:02Z
unexpected argument
commit b9f496c68b27e64f1eddca212db88e3444051cc5
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T23:19:59Z
undefined variable
commit f8c51bab139f0b7c8d9ea070ae40c87bbaf87689
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T23:25:10Z
undefined variable
commit d199692b650edaaf743ca6cfc5c34954baf8831d
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-26T23:29:28Z
undefined variable
commit 0eedec8c62cf5e6ddee4f14ca4b4fa59d2930be5
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-27T02:07:19Z
Working inceptionapi.py without comments
commit 09cb2df973f20e3a877ca1309b67384264650be0
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-29T03:51:31Z
fix for TIKA-2306 contributed by kranthigv
commit f92809ac19d5bef903ef1ac393092e6a13884fc0
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-29T03:55:01Z
fix for TIKA-2306 contributed by kranthigv
commit be773cacaf3c344c11fff9b85ebaf1d0dc8b5174
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-29T04:11:48Z
fix for TIKA-2306 contributed by kranthigv
commit 75a2ae12d170fc99b4bf9ab266c6169859c23dda
Author: Kranthi Kiran GV <[email protected]>
Date: 2017-03-29T05:09:22Z
Changed models repo to a forked repo for future compatibility
----
> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
> Key: TIKA-2262
> URL: https://issues.apache.org/jira/browse/TIKA-2262
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Thamme Gowda
> Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the
> metadata of images to provide a brief summary of the scenery in the image.
> It is a challenging and interesting problem in the domain of computer vision.
> Tika already has a support for image recognition via [Object Recognition
> Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993] which
> uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow.
> Captioning an image is a very useful feature since it helps text based
> Information Retrieval(IR) systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and
> its model for autogenerating captions. [Source Code|
> https://github.com/tensorflow/models/tree/master/im2txt], [Research blog|
> https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this|
> https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server]
> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to
> deeplearning4j|https://deeplearning4j.org/model-import-keras ] and run them
> natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to
> have lots of troubles on the way!
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)