[ 
https://issues.apache.org/jira/browse/TIKA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077580#comment-16077580
 ] 

ASF GitHub Bot commented on TIKA-1988:
--------------------------------------

chrismattmann commented on issue #186: fix for TIKA-1988 contributed by 
[email protected]
URL: https://github.com/apache/tika/pull/186#issuecomment-313587299
 
 
   Finally got this working!
   
   ```
   LMC-053601:tika-parsers mattmann$ java -cp 
../tika-app/target/tika-app-1.16-SNAPSHOT.jar:./model 
org.apache.tika.cli.TikaCLI 
--config=src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml
 -m test.txt
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The 
ImageParser will skip jbig2 images
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   TIFFImageWriter not loaded. tiff files will not be processed
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   J2KImageReader not loaded. JPEG2000 files will not be processed.
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: Tesseract OCR is installed and will be automatically applied to 
image files.
   This may dramatically slow down content extraction (TIKA-2359).
   As of Tika 1.15 (and prior versions), Tesseract is automatically called.
   In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: org.xerial's sqlite-jdbc is not loaded.
   Please provide the jar on your classpath to parse sqlite files.
   See tika-parsers/pom.xml for the correct version.
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The 
ImageParser will skip jbig2 images
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   TIFFImageWriter not loaded. tiff files will not be processed
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   J2KImageReader not loaded. JPEG2000 files will not be processed.
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: Tesseract OCR is installed and will be automatically applied to 
image files.
   This may dramatically slow down content extraction (TIKA-2359).
   As of Tika 1.15 (and prior versions), Tesseract is automatically called.
   In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.
   Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: org.xerial's sqlite-jdbc is not loaded.
   Please provide the jar on your classpath to parse sqlite files.
   See tika-parsers/pom.xml for the correct version.
   INFO  Running Spark version 2.0.0
   WARN  Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
   INFO  Changing view acls to: mattmann
   INFO  Changing modify acls to: mattmann
   INFO  Changing view acls groups to: 
   INFO  Changing modify acls groups to: 
   INFO  SecurityManager: authentication disabled; ui acls disabled; users  
with view permissions: Set(mattmann); groups with view permissions: Set(); 
users  with modify permissions: Set(mattmann); groups with modify permissions: 
Set()
   INFO  Successfully started service 'sparkDriver' on port 51510.
   INFO  Registering MapOutputTracker
   INFO  Registering BlockManagerMaster
   INFO  Created local directory at 
/private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/blockmgr-bd30e8b2-1f38-49f9-b170-c3a95a7e312b
   INFO  MemoryStore started with capacity 2004.6 MB
   INFO  Registering OutputCommitCoordinator
   INFO  Logging initialized @1597ms
   INFO  jetty-9.2.z-SNAPSHOT
   INFO  Started o.s.j.s.ServletContextHandler@f73dcd6{/jobs,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@5c87bfe2{/jobs/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@2fea7088{/jobs/job,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@40499e4f{/jobs/job/json,null,AVAILABLE}
   INFO  Started o.s.j.s.ServletContextHandler@51cd7ffc{/stages,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@30d4b288{/stages/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@4cc6fa2a{/stages/stage,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@40f1be1b{/stages/stage/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@7a791b66{/stages/pool,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@6f2cb653{/stages/pool/json,null,AVAILABLE}
   INFO  Started o.s.j.s.ServletContextHandler@14c01636{/storage,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@590c73d3{/storage/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@6b9ce1bf{/storage/rdd,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@61884cb1{/storage/rdd/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@75ed9710{/environment,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@4fc5e095{/environment/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@435871cb{/executors,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@609640d5{/executors/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@79da1ec0{/executors/threadDump,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@19fb8826{/executors/threadDump/json,null,AVAILABLE}
   INFO  Started o.s.j.s.ServletContextHandler@192d74fb{/static,null,AVAILABLE}
   INFO  Started o.s.j.s.ServletContextHandler@4bef0fe3{/,null,AVAILABLE}
   INFO  Started o.s.j.s.ServletContextHandler@62ea3440{/api,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@27953a83{/stages/stage/kill,null,AVAILABLE}
   INFO  Started ServerConnector@25748410{HTTP/1.1}{0.0.0.0:4040}
   INFO  Started @1705ms
   INFO  Successfully started service 'SparkUI' on port 4040.
   INFO  Bound SparkUI to 0.0.0.0, and started at http://192.168.1.65:4040
   INFO  Starting executor ID driver on host localhost
   INFO  Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 51511.
   INFO  Server created on 192.168.1.65:51511
   INFO  Registering BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
   INFO  Registering block manager 192.168.1.65:51511 with 2004.6 MB RAM, 
BlockManagerId(driver, 192.168.1.65, 51511)
   INFO  Registered BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
   INFO  Started 
o.s.j.s.ServletContextHandler@5305c37d{/metrics/json,null,AVAILABLE}
   WARN  Use an existing SparkContext, some configuration may not take effect.
   INFO  Started o.s.j.s.ServletContextHandler@3c1e3314{/SQL,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@78e16155{/SQL/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@50b0bc4c{/SQL/execution,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@13c612bd{/SQL/execution/json,null,AVAILABLE}
   INFO  Started 
o.s.j.s.ServletContextHandler@28fa700e{/static/sql,null,AVAILABLE}
   INFO  Warehouse path is 
'file:/Users/mattmann/tmp/tika1.15/tika-parsers/spark-warehouse'.
   INFO  Block broadcast_0 stored as values in memory (estimated size 6.1 MB, 
free 1998.5 MB)
   INFO  Block broadcast_0_piece0 stored as bytes in memory (estimated size 
488.5 KB, free 1998.0 MB)
   INFO  Added broadcast_0_piece0 in memory on 192.168.1.65:51511 (size: 488.5 
KB, free: 2004.1 MB)
   INFO  Created broadcast 0 from broadcast at CountVectorizer.scala:243
   INFO  Code generated in 1407.24616 ms
   INFO  Starting job: first at AgePredicterLocal.java:114
   INFO  Got job 0 (first at AgePredicterLocal.java:114) with 1 output 
partitions
   INFO  Final stage: ResultStage 0 (first at AgePredicterLocal.java:114)
   INFO  Parents of final stage: List()
   INFO  Missing parents: List()
   INFO  Submitting ResultStage 0 (MapPartitionsRDD[3] at javaRDD at 
AgePredicterLocal.java:112), which has no missing parents
   INFO  Block broadcast_1 stored as values in memory (estimated size 10.5 KB, 
free 1998.0 MB)
   INFO  Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 
KB, free 1998.0 MB)
   INFO  Added broadcast_1_piece0 in memory on 192.168.1.65:51511 (size: 5.3 
KB, free: 2004.1 MB)
   INFO  Created broadcast 1 from broadcast at DAGScheduler.scala:1012
   INFO  Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at 
javaRDD at AgePredicterLocal.java:112)
   INFO  Adding task set 0.0 with 1 tasks
   INFO  Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, 
PROCESS_LOCAL, 6477 bytes)
   INFO  Running task 0.0 in stage 0.0 (TID 0)
   INFO  Code generated in 16.846256 ms
   INFO  Finished task 0.0 in stage 0.0 (TID 0). 3228 bytes result sent to 
driver
   INFO  Finished task 0.0 in stage 0.0 (TID 0) in 90 ms on localhost (1/1)
   INFO  Removed TaskSet 0.0, whose tasks have all completed, from pool 
   INFO  ResultStage 0 (first at AgePredicterLocal.java:114) finished in 0.103 s
   INFO  Job 0 finished: first at AgePredicterLocal.java:114, took 0.161496 s
   Content-Length: 17
   Content-Type: text/plain
   Estimated-Author-Age: 32.29913797083779
   X-Parsed-By: org.apache.tika.parser.CompositeParser
   X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser
   resourceName: test.txt
   INFO  Invoking stop() from shutdown hook
   INFO  Stopped ServerConnector@25748410{HTTP/1.1}{0.0.0.0:4040}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@27953a83{/stages/stage/kill,null,UNAVAILABLE}
   INFO  Stopped o.s.j.s.ServletContextHandler@62ea3440{/api,null,UNAVAILABLE}
   INFO  Stopped o.s.j.s.ServletContextHandler@4bef0fe3{/,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@192d74fb{/static,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@19fb8826{/executors/threadDump/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@79da1ec0{/executors/threadDump,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@609640d5{/executors/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@435871cb{/executors,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@4fc5e095{/environment/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@75ed9710{/environment,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@61884cb1{/storage/rdd/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@6b9ce1bf{/storage/rdd,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@590c73d3{/storage/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@14c01636{/storage,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@6f2cb653{/stages/pool/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@7a791b66{/stages/pool,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@40f1be1b{/stages/stage/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@4cc6fa2a{/stages/stage,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@30d4b288{/stages/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@51cd7ffc{/stages,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@40499e4f{/jobs/job/json,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@2fea7088{/jobs/job,null,UNAVAILABLE}
   INFO  Stopped 
o.s.j.s.ServletContextHandler@5c87bfe2{/jobs/json,null,UNAVAILABLE}
   INFO  Stopped o.s.j.s.ServletContextHandler@f73dcd6{/jobs,null,UNAVAILABLE}
   INFO  Stopped Spark web UI at http://192.168.1.65:4040
   INFO  MapOutputTrackerMasterEndpoint stopped!
   INFO  MemoryStore cleared
   INFO  BlockManager stopped
   INFO  BlockManagerMaster stopped
   INFO  OutputCommitCoordinator stopped!
   INFO  Successfully stopped SparkContext
   INFO  Shutdown hook called
   INFO  Deleting directory 
/private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/spark-fa52d6bc-863e-4ee1-98da-8352c0c5c84e
   LMC-053601:tika-parsers mattmann$ 
   ```
   
   Will commit now!
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Age Detection Tika Recogniser
> -----------------------------
>
>                 Key: TIKA-1988
>                 URL: https://issues.apache.org/jira/browse/TIKA-1988
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Madhav Sharan
>
> Author age can be firs feature and more can be added later
> --
> Integrating work done on age classification. More details about classifier in 
> below repo -
> https://github.com/USCDataScience/Age-Predictor
> Git repo have a java client which can be integrated in Tika



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to