[
https://issues.apache.org/jira/browse/TIKA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077580#comment-16077580
]
ASF GitHub Bot commented on TIKA-1988:
--------------------------------------
chrismattmann commented on issue #186: fix for TIKA-1988 contributed by
[email protected]
URL: https://github.com/apache/tika/pull/186#issuecomment-313587299
Finally got this working!
```
LMC-053601:tika-parsers mattmann$ java -cp
../tika-app/target/tika-app-1.16-SNAPSHOT.jar:./model
org.apache.tika.cli.TikaCLI
--config=src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml
-m test.txt
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The
ImageParser will skip jbig2 images
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
TIFFImageWriter not loaded. tiff files will not be processed
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to
image files.
This may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on
via TikaConfig.
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The
ImageParser will skip jbig2 images
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
TIFFImageWriter not loaded. tiff files will not be processed
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to
image files.
This may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on
via TikaConfig.
Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
INFO Running Spark version 2.0.0
WARN Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
INFO Changing view acls to: mattmann
INFO Changing modify acls to: mattmann
INFO Changing view acls groups to:
INFO Changing modify acls groups to:
INFO SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(mattmann); groups with view permissions: Set();
users with modify permissions: Set(mattmann); groups with modify permissions:
Set()
INFO Successfully started service 'sparkDriver' on port 51510.
INFO Registering MapOutputTracker
INFO Registering BlockManagerMaster
INFO Created local directory at
/private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/blockmgr-bd30e8b2-1f38-49f9-b170-c3a95a7e312b
INFO MemoryStore started with capacity 2004.6 MB
INFO Registering OutputCommitCoordinator
INFO Logging initialized @1597ms
INFO jetty-9.2.z-SNAPSHOT
INFO Started o.s.j.s.ServletContextHandler@f73dcd6{/jobs,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@5c87bfe2{/jobs/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@2fea7088{/jobs/job,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@40499e4f{/jobs/job/json,null,AVAILABLE}
INFO Started o.s.j.s.ServletContextHandler@51cd7ffc{/stages,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@30d4b288{/stages/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@4cc6fa2a{/stages/stage,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@40f1be1b{/stages/stage/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@7a791b66{/stages/pool,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@6f2cb653{/stages/pool/json,null,AVAILABLE}
INFO Started o.s.j.s.ServletContextHandler@14c01636{/storage,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@590c73d3{/storage/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@6b9ce1bf{/storage/rdd,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@61884cb1{/storage/rdd/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@75ed9710{/environment,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@4fc5e095{/environment/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@435871cb{/executors,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@609640d5{/executors/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@79da1ec0{/executors/threadDump,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@19fb8826{/executors/threadDump/json,null,AVAILABLE}
INFO Started o.s.j.s.ServletContextHandler@192d74fb{/static,null,AVAILABLE}
INFO Started o.s.j.s.ServletContextHandler@4bef0fe3{/,null,AVAILABLE}
INFO Started o.s.j.s.ServletContextHandler@62ea3440{/api,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@27953a83{/stages/stage/kill,null,AVAILABLE}
INFO Started ServerConnector@25748410{HTTP/1.1}{0.0.0.0:4040}
INFO Started @1705ms
INFO Successfully started service 'SparkUI' on port 4040.
INFO Bound SparkUI to 0.0.0.0, and started at http://192.168.1.65:4040
INFO Starting executor ID driver on host localhost
INFO Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 51511.
INFO Server created on 192.168.1.65:51511
INFO Registering BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
INFO Registering block manager 192.168.1.65:51511 with 2004.6 MB RAM,
BlockManagerId(driver, 192.168.1.65, 51511)
INFO Registered BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
INFO Started
o.s.j.s.ServletContextHandler@5305c37d{/metrics/json,null,AVAILABLE}
WARN Use an existing SparkContext, some configuration may not take effect.
INFO Started o.s.j.s.ServletContextHandler@3c1e3314{/SQL,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@78e16155{/SQL/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@50b0bc4c{/SQL/execution,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@13c612bd{/SQL/execution/json,null,AVAILABLE}
INFO Started
o.s.j.s.ServletContextHandler@28fa700e{/static/sql,null,AVAILABLE}
INFO Warehouse path is
'file:/Users/mattmann/tmp/tika1.15/tika-parsers/spark-warehouse'.
INFO Block broadcast_0 stored as values in memory (estimated size 6.1 MB,
free 1998.5 MB)
INFO Block broadcast_0_piece0 stored as bytes in memory (estimated size
488.5 KB, free 1998.0 MB)
INFO Added broadcast_0_piece0 in memory on 192.168.1.65:51511 (size: 488.5
KB, free: 2004.1 MB)
INFO Created broadcast 0 from broadcast at CountVectorizer.scala:243
INFO Code generated in 1407.24616 ms
INFO Starting job: first at AgePredicterLocal.java:114
INFO Got job 0 (first at AgePredicterLocal.java:114) with 1 output
partitions
INFO Final stage: ResultStage 0 (first at AgePredicterLocal.java:114)
INFO Parents of final stage: List()
INFO Missing parents: List()
INFO Submitting ResultStage 0 (MapPartitionsRDD[3] at javaRDD at
AgePredicterLocal.java:112), which has no missing parents
INFO Block broadcast_1 stored as values in memory (estimated size 10.5 KB,
free 1998.0 MB)
INFO Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3
KB, free 1998.0 MB)
INFO Added broadcast_1_piece0 in memory on 192.168.1.65:51511 (size: 5.3
KB, free: 2004.1 MB)
INFO Created broadcast 1 from broadcast at DAGScheduler.scala:1012
INFO Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at
javaRDD at AgePredicterLocal.java:112)
INFO Adding task set 0.0 with 1 tasks
INFO Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,
PROCESS_LOCAL, 6477 bytes)
INFO Running task 0.0 in stage 0.0 (TID 0)
INFO Code generated in 16.846256 ms
INFO Finished task 0.0 in stage 0.0 (TID 0). 3228 bytes result sent to
driver
INFO Finished task 0.0 in stage 0.0 (TID 0) in 90 ms on localhost (1/1)
INFO Removed TaskSet 0.0, whose tasks have all completed, from pool
INFO ResultStage 0 (first at AgePredicterLocal.java:114) finished in 0.103 s
INFO Job 0 finished: first at AgePredicterLocal.java:114, took 0.161496 s
Content-Length: 17
Content-Type: text/plain
Estimated-Author-Age: 32.29913797083779
X-Parsed-By: org.apache.tika.parser.CompositeParser
X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser
resourceName: test.txt
INFO Invoking stop() from shutdown hook
INFO Stopped ServerConnector@25748410{HTTP/1.1}{0.0.0.0:4040}
INFO Stopped
o.s.j.s.ServletContextHandler@27953a83{/stages/stage/kill,null,UNAVAILABLE}
INFO Stopped o.s.j.s.ServletContextHandler@62ea3440{/api,null,UNAVAILABLE}
INFO Stopped o.s.j.s.ServletContextHandler@4bef0fe3{/,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@192d74fb{/static,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@19fb8826{/executors/threadDump/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@79da1ec0{/executors/threadDump,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@609640d5{/executors/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@435871cb{/executors,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@4fc5e095{/environment/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@75ed9710{/environment,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@61884cb1{/storage/rdd/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@6b9ce1bf{/storage/rdd,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@590c73d3{/storage/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@14c01636{/storage,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@6f2cb653{/stages/pool/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@7a791b66{/stages/pool,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@40f1be1b{/stages/stage/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@4cc6fa2a{/stages/stage,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@30d4b288{/stages/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@51cd7ffc{/stages,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@40499e4f{/jobs/job/json,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@2fea7088{/jobs/job,null,UNAVAILABLE}
INFO Stopped
o.s.j.s.ServletContextHandler@5c87bfe2{/jobs/json,null,UNAVAILABLE}
INFO Stopped o.s.j.s.ServletContextHandler@f73dcd6{/jobs,null,UNAVAILABLE}
INFO Stopped Spark web UI at http://192.168.1.65:4040
INFO MapOutputTrackerMasterEndpoint stopped!
INFO MemoryStore cleared
INFO BlockManager stopped
INFO BlockManagerMaster stopped
INFO OutputCommitCoordinator stopped!
INFO Successfully stopped SparkContext
INFO Shutdown hook called
INFO Deleting directory
/private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/spark-fa52d6bc-863e-4ee1-98da-8352c0c5c84e
LMC-053601:tika-parsers mattmann$
```
Will commit now!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Age Detection Tika Recogniser
> -----------------------------
>
> Key: TIKA-1988
> URL: https://issues.apache.org/jira/browse/TIKA-1988
> Project: Tika
> Issue Type: New Feature
> Reporter: Madhav Sharan
>
> Author age can be firs feature and more can be added later
> --
> Integrating work done on age classification. More details about classifier in
> below repo -
> https://github.com/USCDataScience/Age-Predictor
> Git repo have a java client which can be integrated in Tika
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)