GitHub user thammegowda opened a pull request:
https://github.com/apache/tika/pull/165
[TIKA-DL] Image recognition powered by deeplearning4j and InceptionV3
## Summary
+ added `tika-dl` module which depends on `deeplearning4j` library. This
module will produce an addon with all the DL4J dependencies and its native
dependencies which may be optionally added to classpath by user to make use of
it
+ By default, the build system includes native libs for all major
platforms (such as Linux, Windows, OSX, Android/ARM)
+ Unnecessary native libs can be easily excluded by setting the target
platform as `-Djavacpp.platform=<target>` during the build
+ Permissible target values = {`android-arm`, `linux-x86_64`,
`macosx-x86_64`, `windows-x86_64`, etc.}
+ added `DL4JInceptionV3Net.java` which provides Image recognition features
using InceptionV3.
+ Similar to VGG-16 model in #159, VGG-16 model is huuuuge (over 500MB to
download)and requires plenty of RAM (~3GB) to run. The beauty of Inception-V3
model is that it is just 90MB to download and requires ~400MB to run
+ No setup required. This implementation is configured to download the
model when it runs the first time. It downloads from our [USCDataScience's
repo](https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example/data)
+ It is flexible. Offers plenty of settings to change them. Look for
`@Field` annotation in the code
+ added a Test case to test the above implementation
## How to Test
1. Build the code : `mvn package` or `mvn package -DskipTests` or `mvn
package -DskipTests -Djavacpp.platform=<>`
2. Run:
```bash
java -Xmx400m -cp
./tika-dl/target/tika-dl-1.15-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.15-SNAPSHOT.jar
\
org.apache.tika.cli.TikaCLI
--config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-inception3-config.xml
dog.jpg
```
## Note:
Tested on `macosx-x86_64` platform, we have to test on `linux-x86_64` and
`windows-x86_64` before it gets merged.
Feedback/Critiques are welcome.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/thammegowda/tika tika-dl
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tika/pull/165.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #165
----
commit 1472a4e275ed276b69f11dea6d663bc2136566d0
Author: Thamme Gowda <[email protected]>
Date: 2017-04-02T20:52:04Z
[TIKA-DL] Added tika-dl module to the build system
commit ce28a6f545780144736c0d8c84995d218ab6ffbb
Author: Thamme Gowda <[email protected]>
Date: 2017-04-02T21:06:50Z
Fix scheme value for file URIs
commit 3cbf36800b01e5255a4bc1b87d737896a82d3c0f
Author: Thamme Gowda <[email protected]>
Date: 2017-04-02T21:25:59Z
[TIKA-DL] build jar with dependencies by default
commit d1c951396bd5a6a849273f590743931cd89d493e
Author: Thamme Gowda <[email protected]>
Date: 2017-04-03T02:42:56Z
[TIKA-DL] add license headers
commit 81b3f32103a497eaa99511af09eb253275c67cd9
Author: Thamme Gowda <[email protected]>
Date: 2017-04-03T02:51:07Z
Fix typos and unnecessary spaces
commit 5834afeff5de4d1076180de3ddece8e7b807b7f3
Author: Thamme Gowda <[email protected]>
Date: 2017-04-03T03:03:11Z
Fix XML format
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---