[
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075728#comment-16075728
]
ASF GitHub Bot commented on TIKA-2298:
--------------------------------------
chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313265776
OK, I got it working, great job @asmehra95! I am good to merge this into
1.16. Let me double check there are no objections (if so we can back it out).
## Build passes
```
[INFO] Loading classes to check...
[INFO] Scanning classes for violations...
[INFO] Scanned 2 (and 230 related) class file(s) for forbidden API
invocations (in 0.04s), 0 error(s).
[INFO]
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ tika-dl ---
[INFO] Installing
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar
[INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom
[INFO] Installing
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
to
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 03:48 min
[INFO] Finished at: 2017-07-05T17:24:47-07:00
[INFO] Final Memory: 129M/1177M
[INFO]
------------------------------------------------------------------------
LMC-053601:tika-dl mattmann$
```
## Running Lion Image Recognition Test
```bash
$cat test.sh
java -Xmx3G -cp
./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar
org.apache.tika.cli.TikaCLI
--config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg
```
```bash
LMC-053601:tika1.15 mattmann$ sh test.sh
Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The
ImageParser will skip jbig2 images
Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to
image files.
This may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on
via TikaConfig.
Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
INFO Loaded [CpuBackend] backend
INFO Number of threads used for NativeOps: 4
INFO Reflections took 130 ms to scan 1 urls, producing 29 keys and 189
values
INFO Number of threads used for BLAS: 4
INFO Backend used: [CPU]; OS: [Mac OS X]
INFO Cores: [8]; Memory: [2.7GB];
INFO Blas vendor: [OPENBLAS]
WARN could not create Vfs.Dir from url. ignoring the exception and
continuing
org.reflections.ReflectionsException: could not create Vfs.Dir from url, no
matching UrlType was found
[file:/System/Library/Java/Extensions/libJ3DAudio.jnilib]
either use fromURL(final URL url, final List<UrlType> urlTypes) or use the
static setDefaultURLTypes(final List<UrlType> urlTypes) or
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.<init>(Reflections.java:129)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
at
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
at
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
at
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
WARN could not create Vfs.Dir from url. ignoring the exception and
continuing
org.reflections.ReflectionsException: could not create Vfs.Dir from url, no
matching UrlType was found
[file:/System/Library/Java/Extensions/libAppleScriptEngine.jnilib]
either use fromURL(final URL url, final List<UrlType> urlTypes) or use the
static setDefaultURLTypes(final List<UrlType> urlTypes) or
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.<init>(Reflections.java:129)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
at
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
at
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
at
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
WARN could not create Vfs.Dir from url. ignoring the exception and
continuing
org.reflections.ReflectionsException: could not create Vfs.Dir from url, no
matching UrlType was found [file:/System/Library/Java/Extensions/libJ3D.jnilib]
either use fromURL(final URL url, final List<UrlType> urlTypes) or use the
static setDefaultURLTypes(final List<UrlType> urlTypes) or
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.<init>(Reflections.java:129)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
at
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
at
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
at
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
WARN could not create Vfs.Dir from url. ignoring the exception and
continuing
org.reflections.ReflectionsException: could not create Vfs.Dir from url, no
matching UrlType was found [file:/usr/lib/java/libjdns_sd.jnilib]
either use fromURL(final URL url, final List<UrlType> urlTypes) or use the
static setDefaultURLTypes(final List<UrlType> urlTypes) or
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.<init>(Reflections.java:129)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
at
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
at
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
at
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
WARN could not create Vfs.Dir from url. ignoring the exception and
continuing
org.reflections.ReflectionsException: could not create Vfs.Dir from url, no
matching UrlType was found
[file:/System/Library/Java/Extensions/libmlib_jai.jnilib]
either use fromURL(final URL url, final List<UrlType> urlTypes) or use the
static setDefaultURLTypes(final List<UrlType> urlTypes) or
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.<init>(Reflections.java:129)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
at
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
at
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
at
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
WARN could not create Vfs.Dir from url. ignoring the exception and
continuing
org.reflections.ReflectionsException: could not create Vfs.Dir from url, no
matching UrlType was found
[file:/System/Library/Java/Extensions/libJ3DUtils.jnilib]
either use fromURL(final URL url, final List<UrlType> urlTypes) or use the
static setDefaultURLTypes(final List<UrlType> urlTypes) or
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:237)
at org.reflections.Reflections.scan(Reflections.java:204)
at org.reflections.Reflections.<init>(Reflections.java:129)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
at
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
at
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
at
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
at
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
at
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
at
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
INFO Reflections took 1529 ms to scan 12 urls, producing 3728 keys and
16714 values
INFO Preprocessed Model Loaded from
/Users/mattmann/.dl4j/trainedmodels/tikaPreprocessed/vgg16.zip
INFO minConfidence = 0.015, topN=3
INFO Recogniser = org.apache.tika.dl.imagerec.DL4JVGG16Net
INFO Recogniser Available = true
INFO Reflections took 134 ms to scan 1 urls, producing 371 keys and 1443
values
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="org.apache.tika.parser.recognition.object.rec.impl"
content="org.apache.tika.dl.imagerec.DL4JVGG16Net"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
<meta name="X-Parsed-By"
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
<meta name="resourceName" content="lion.jpg"/>
<meta name="Content-Length" content="44441"/>
<meta name="OBJECT" content="lion (0.99999)"/>
<meta name="Content-Type" content="image/jpeg"/>
<title/>
</head>
<body><ol id="objects"> <li id="lion"> lion [eng](confidence = 0.999988
)</li>
</ol>
</body></html>
```
Yay! Works!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> To improve object recognition parser so that it may work without external
> RESTful service setup
> -----------------------------------------------------------------------------------------------
>
> Key: TIKA-2298
> URL: https://issues.apache.org/jira/browse/TIKA-2298
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.14
> Reporter: Avtar Singh
> Labels: ObjectRecognitionParser
> Fix For: 1.16
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks. All the popular neural networks were in
> C++ or python. Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)