[ https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075718#comment-16075718 ]
ASF GitHub Bot commented on TIKA-2298: -------------------------------------- chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313265776 OK, I got it working, great job @asmehra95! I am good to merge this into 1.16. Let me double check there are no objections (if so we can back it out). h2. Build passes ``` [INFO] Loading classes to check... [INFO] Scanning classes for violations... [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API invocations (in 0.04s), 0 error(s). [INFO] [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ tika-dl --- [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:48 min [INFO] Finished at: 2017-07-05T17:24:47-07:00 [INFO] Final Memory: 129M/1177M [INFO] ------------------------------------------------------------------------ LMC-053601:tika-dl mattmann$ ``` h2. Running Lion Image Recognition Test ```bash $cat test.sh java -Xmx3G -cp ./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg ``` ```bash LMC-053601:tika1.15 mattmann$ sh test.sh Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files. This may dramatically slow down content extraction (TIKA-2359). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Loaded [CpuBackend] backend INFO Number of threads used for NativeOps: 4 INFO Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 values INFO Number of threads used for BLAS: 4 INFO Backend used: [CPU]; OS: [Mac OS X] INFO Cores: [8]; Memory: [2.7GB]; INFO Blas vendor: [OPENBLAS] WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found [file:/System/Library/Java/Extensions/libJ3DAudio.jnilib] either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found [file:/System/Library/Java/Extensions/libAppleScriptEngine.jnilib] either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found [file:/System/Library/Java/Extensions/libJ3D.jnilib] either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found [file:/usr/lib/java/libjdns_sd.jnilib] either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found [file:/System/Library/Java/Extensions/libmlib_jai.jnilib] either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found [file:/System/Library/Java/Extensions/libJ3DUtils.jnilib] either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) INFO Reflections took 1529 ms to scan 12 urls, producing 3728 keys and 16714 values INFO Preprocessed Model Loaded from /Users/mattmann/.dl4j/trainedmodels/tikaPreprocessed/vgg16.zip INFO minConfidence = 0.015, topN=3 INFO Recogniser = org.apache.tika.dl.imagerec.DL4JVGG16Net INFO Recogniser Available = true INFO Reflections took 134 ms to scan 1 urls, producing 371 keys and 1443 values <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="org.apache.tika.parser.recognition.object.rec.impl" content="org.apache.tika.dl.imagerec.DL4JVGG16Net"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/> <meta name="resourceName" content="lion.jpg"/> <meta name="Content-Length" content="44441"/> <meta name="OBJECT" content="lion (0.99999)"/> <meta name="Content-Type" content="image/jpeg"/> <title/> </head> <body><ol id="objects"> <li id="lion"> lion [eng](confidence = 0.999988 )</li> </ol> </body></html> ``` Yay! Works! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > To improve object recognition parser so that it may work without external > RESTful service setup > ----------------------------------------------------------------------------------------------- > > Key: TIKA-2298 > URL: https://issues.apache.org/jira/browse/TIKA-2298 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.14 > Reporter: Avtar Singh > Labels: ObjectRecognitionParser > Fix For: 1.16 > > Original Estimate: 672h > Remaining Estimate: 672h > > When ObjectRecognitionParser was built to do image recognition, there wasn't > good support for Java frameworks. All the popular neural networks were in > C++ or python. Since there was nothing that runs within JVM, we tried > several ways to glue them to Tika (like CLI, JNI, gRPC, REST). > However, this game is changing slowly now. Deeplearning4j, the most famous > neural network library for JVM, now supports importing models that are > pre-trained in python/C++ based kits [5]. > *Improvement:* > It will be nice to have an implementation of ObjectRecogniser that > doesn't require any external setup(like installation of native libraries or > starting REST services). Reasons: easy to distribute and also to cut the IO > time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)