[ 
https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075718#comment-16075718
 ] 

ASF GitHub Bot commented on TIKA-2298:
--------------------------------------

chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by 
asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
URL: https://github.com/apache/tika/pull/182#issuecomment-313265776
 
 
   OK, I got it working, great job @asmehra95! I am good to merge this into 
1.16. Let me double check there are no objections (if so we can back it out).
   
   h2. Build passes
   
   ```
   [INFO] Loading classes to check...
   [INFO] Scanning classes for violations...
   [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API 
invocations (in 0.04s), 0 error(s).
   [INFO] 
   [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ tika-dl ---
   [INFO] Installing 
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar
   [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom
   [INFO] Installing 
/Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
 to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time: 03:48 min
   [INFO] Finished at: 2017-07-05T17:24:47-07:00
   [INFO] Final Memory: 129M/1177M
   [INFO] 
------------------------------------------------------------------------
   LMC-053601:tika-dl mattmann$ 
   ```
   
   h2. Running Lion Image Recognition Test
   ```bash
   $cat test.sh
   java -Xmx3G -cp 
./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar
 org.apache.tika.cli.TikaCLI 
--config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
 tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg
   ```
   
   ```bash
   LMC-053601:tika1.15 mattmann$ sh test.sh
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The 
ImageParser will skip jbig2 images
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   J2KImageReader not loaded. JPEG2000 files will not be processed.
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: Tesseract OCR is installed and will be automatically applied to 
image files.
   This may dramatically slow down content extraction (TIKA-2359).
   As of Tika 1.15 (and prior versions), Tesseract is automatically called.
   In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.
   Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: org.xerial's sqlite-jdbc is not loaded.
   Please provide the jar on your classpath to parse sqlite files.
   See tika-parsers/pom.xml for the correct version.
   INFO  Loaded [CpuBackend] backend
   INFO  Number of threads used for NativeOps: 4
   INFO  Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 
values 
   INFO  Number of threads used for BLAS: 4
   INFO  Backend used: [CPU]; OS: [Mac OS X]
   INFO  Cores: [8]; Memory: [2.7GB];
   INFO  Blas vendor: [OPENBLAS]
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found 
[file:/System/Library/Java/Extensions/libJ3DAudio.jnilib]
   either use fromURL(final URL url, final List<UrlType> urlTypes) or use the 
static setDefaultURLTypes(final List<UrlType> urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
        at org.reflections.Reflections.scan(Reflections.java:237)
        at org.reflections.Reflections.scan(Reflections.java:204)
        at org.reflections.Reflections.<init>(Reflections.java:129)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
        at 
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
        at 
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
        at 
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
        at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found 
[file:/System/Library/Java/Extensions/libAppleScriptEngine.jnilib]
   either use fromURL(final URL url, final List<UrlType> urlTypes) or use the 
static setDefaultURLTypes(final List<UrlType> urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
        at org.reflections.Reflections.scan(Reflections.java:237)
        at org.reflections.Reflections.scan(Reflections.java:204)
        at org.reflections.Reflections.<init>(Reflections.java:129)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
        at 
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
        at 
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
        at 
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
        at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found [file:/System/Library/Java/Extensions/libJ3D.jnilib]
   either use fromURL(final URL url, final List<UrlType> urlTypes) or use the 
static setDefaultURLTypes(final List<UrlType> urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
        at org.reflections.Reflections.scan(Reflections.java:237)
        at org.reflections.Reflections.scan(Reflections.java:204)
        at org.reflections.Reflections.<init>(Reflections.java:129)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
        at 
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
        at 
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
        at 
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
        at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found [file:/usr/lib/java/libjdns_sd.jnilib]
   either use fromURL(final URL url, final List<UrlType> urlTypes) or use the 
static setDefaultURLTypes(final List<UrlType> urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
        at org.reflections.Reflections.scan(Reflections.java:237)
        at org.reflections.Reflections.scan(Reflections.java:204)
        at org.reflections.Reflections.<init>(Reflections.java:129)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
        at 
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
        at 
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
        at 
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
        at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found 
[file:/System/Library/Java/Extensions/libmlib_jai.jnilib]
   either use fromURL(final URL url, final List<UrlType> urlTypes) or use the 
static setDefaultURLTypes(final List<UrlType> urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
        at org.reflections.Reflections.scan(Reflections.java:237)
        at org.reflections.Reflections.scan(Reflections.java:204)
        at org.reflections.Reflections.<init>(Reflections.java:129)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
        at 
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
        at 
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
        at 
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
        at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
   WARN  could not create Vfs.Dir from url. ignoring the exception and 
continuing
   org.reflections.ReflectionsException: could not create Vfs.Dir from url, no 
matching UrlType was found 
[file:/System/Library/Java/Extensions/libJ3DUtils.jnilib]
   either use fromURL(final URL url, final List<UrlType> urlTypes) or use the 
static setDefaultURLTypes(final List<UrlType> urlTypes) or 
addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
        at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
        at org.reflections.Reflections.scan(Reflections.java:237)
        at org.reflections.Reflections.scan(Reflections.java:204)
        at org.reflections.Reflections.<init>(Reflections.java:129)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
        at 
org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
        at 
org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
        at 
org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
        at 
org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
        at 
org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
        at 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
        at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
   INFO  Reflections took 1529 ms to scan 12 urls, producing 3728 keys and 
16714 values 
   INFO  Preprocessed Model Loaded from 
/Users/mattmann/.dl4j/trainedmodels/tikaPreprocessed/vgg16.zip
   INFO  minConfidence = 0.015, topN=3
   INFO  Recogniser = org.apache.tika.dl.imagerec.DL4JVGG16Net
   INFO  Recogniser Available = true
   INFO  Reflections took 134 ms to scan 1 urls, producing 371 keys and 1443 
values 
   <?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
   <head>
   <meta name="org.apache.tika.parser.recognition.object.rec.impl" 
content="org.apache.tika.dl.imagerec.DL4JVGG16Net"/>
   <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
   <meta name="X-Parsed-By" 
content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
   <meta name="resourceName" content="lion.jpg"/>
   <meta name="Content-Length" content="44441"/>
   <meta name="OBJECT" content="lion (0.99999)"/>
   <meta name="Content-Type" content="image/jpeg"/>
   <title/>
   </head>
   <body><ol id="objects">      <li id="lion"> lion [eng](confidence = 0.999988 
)</li>
   </ol>
   </body></html>
   ```
   
   Yay! Works!
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To improve object recognition parser so that it may work without external 
> RESTful service setup
> -----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2298
>                 URL: https://issues.apache.org/jira/browse/TIKA-2298
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.14
>            Reporter: Avtar Singh
>              Labels: ObjectRecognitionParser
>             Fix For: 1.16
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> When ObjectRecognitionParser was built to do image recognition, there wasn't
> good support for Java frameworks.  All the popular neural networks were in
> C++ or python.  Since there was nothing that runs within JVM, we tried
> several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
> However, this game is changing slowly now. Deeplearning4j, the most famous
> neural network library for JVM, now supports importing models that are
> pre-trained in python/C++ based kits [5].
> *Improvement:*
> It will be nice to have an implementation of ObjectRecogniser that
> doesn't require any external setup(like installation of native libraries or
> starting REST services). Reasons: easy to distribute and also to cut the IO
> time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to