lewismc edited a comment on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-848865170


   @philipsoutham I can replicate the build and test results
   ```
   404c9ade89429296fb846060d2d3f13105b6f9b1bf8d96e9998fae304470c863
   Image: apache/tika:1.26 - Passed
   1.26
   1.26
   d0bb05c60afff50ed8d6c84995984dc7d8ecd0cedce8e044f9f60470bcc4aac9
   Image: apache/tika:1.26-full - Passed
   1.26-full
   1.26-full
   ```
   I experienced NO issues with the docker compositions for 
`docker-compose-tika-customocr.yml`, `docker-compose-tika-grobid.yml` or 
`docker-compose-tika-ner.yml`. 
   
   I was **UNABLE** to reproduce the issue you describe above regarding the NER 
example. Can you provide more detail so I can reproduce?
   
   I did experience an issue with `docker-compose-tika-vision.yml`
   ```
   Attaching to inception-caption_1, inception-rest_1, inception-video_1, tika_1
   inception-caption_1  | INFO:tensorflow:Building model.
   inception-video_1    | 2021-05-26 15:32:50.721748: I 
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 
AVX AVX2 FMA
   inception-caption_1  | INFO:tensorflow:Initializing vocabulary from file: 
/usr/share/apache-tika/models/dl/image/caption/1M_iters_ckpt/word_counts.txt
   inception-caption_1  | INFO:tensorflow:Created vocabulary with 11520 words
   inception-caption_1  | 2021-05-26 15:32:51.113950: I 
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 
AVX AVX2 FMA
   inception-caption_1  | INFO:tensorflow:Loading model from checkpoint: 
/usr/share/apache-tika/models/dl/image/caption/1M_iters_ckpt/model.ckpt-1000000
   inception-caption_1  | INFO:tensorflow:Restoring parameters from 
/usr/share/apache-tika/models/dl/image/caption/1M_iters_ckpt/model.ckpt-1000000
   inception-video_1    | ('cv2.__version__', '3.2.0')
   inception-video_1    | Serving on port 8764
   inception-video_1    |  * Serving Flask app "inceptionapi" (lazy loading)
   inception-video_1    |  * Environment: production
   inception-video_1    |    WARNING: Do not use the development server in a 
production environment.
   inception-video_1    |    Use a production WSGI server instead.
   inception-video_1    |  * Debug mode: off
   inception-video_1    |  * Running on http://0.0.0.0:8764/ (Press CTRL+C to 
quit)
   inception-caption_1  | INFO:tensorflow:Successfully loaded checkpoint: 
model.ckpt-1000000
   inception-caption_1  | Serving on port 8764
   inception-caption_1  |  * Serving Flask app "im2txtapi" (lazy loading)
   inception-caption_1  |  * Environment: production
   inception-caption_1  |    WARNING: Do not use the development server in a 
production environment.
   inception-caption_1  |    Use a production WSGI server instead.
   inception-caption_1  |  * Debug mode: off
   inception-caption_1  |  * Running on http://0.0.0.0:8764/ (Press CTRL+C to 
quit)
   tika_1               | May 26, 2021 3:32:53 PM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
   tika_1               | WARNING: J2KImageReader not loaded. JPEG2000 files 
will not be processed.
   tika_1               | See 
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   tika_1               | for optional dependencies.
   tika_1               |
   tika_1               | May 26, 2021 3:32:53 PM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
   tika_1               | WARNING: Tesseract OCR is installed and will be 
automatically applied to image files unless
   tika_1               | you've excluded the TesseractOCRParser from the 
default parser.
   tika_1               | Tesseract may dramatically slow down content 
extraction (TIKA-2359).
   tika_1               | As of Tika 1.15 (and prior versions), Tesseract is 
automatically called.
   tika_1               | In future versions of Tika, users may need to turn 
the TesseractOCRParser on via TikaConfig.
   tika_1               | May 26, 2021 3:32:53 PM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
   tika_1               | WARNING: org.xerial's sqlite-jdbc is not loaded.
   tika_1               | Please provide the jar on your classpath to parse 
sqlite files.
   tika_1               | See tika-parsers/pom.xml for the correct version.
   tika_1               | INFO  Starting Apache Tika 1.25 server
   tika_1               | INFO  Using custom config: /tika-config.xml
   inception-caption_1  | 172.23.0.5 - - [26/May/2021 15:32:53] "GET 
/inception/v3/ping HTTP/1.1" 200 -
   tika_1               | INFO  Available = true, API Status = HTTP/1.0 200 OK
   tika_1               | INFO  Captions = 5, MaxCaptionLength = 15
   tika_1               | INFO  Recogniser = 
org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
   tika_1               | INFO  Recogniser Available = true
   tika_1               | INFO  Setting the server's publish address to be 
http://0.0.0.0:9998/
   tika_1               | INFO  Logging initialized @1803ms to 
org.eclipse.jetty.util.log.Slf4jLog
   tika_1               | INFO  jetty-9.4.33.v20201020; built: 
2020-10-20T23:39:24.803Z; git: 1be68755656cef678b79a2ef1c2ebbca99e25420; jvm 
14.0.2+12-Ubuntu-120.04
   tika_1               | INFO  Started ServerConnector@1d8e2eea{HTTP/1.1, 
(http/1.1)}{0.0.0.0:9998}
   tika_1               | INFO  Started @1870ms
   tika_1               | WARN  Empty contextPath
   tika_1               | INFO  Started 
o.e.j.s.h.ContextHandler@48b0e701{/,null,AVAILABLE}
   tika_1               | INFO  Started Apache Tika server at 
http://0.0.0.0:9998/
   inception-rest_1     | 2021-05-26 15:32:54.352046: I 
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 
AVX AVX2 FMA
   inception-rest_1     | Can't import video libraries, No video functionality 
is available
   inception-rest_1     | Serving on port 8764
   inception-rest_1     |  * Serving Flask app "inceptionapi" (lazy loading)
   inception-rest_1     |  * Environment: production
   inception-rest_1     |    WARNING: Do not use the development server in a 
production environment.
   inception-rest_1     |    Use a production WSGI server instead.
   inception-rest_1     |  * Debug mode: off
   inception-rest_1     |  * Running on http://0.0.0.0:8764/ (Press CTRL+C to 
quit)
   tika_1               | WARN  Both 
org.apache.tika.server.resource.TikaResource#getHTML and 
org.apache.tika.server.resource.TikaResource#getText are equal candidates for 
handling the current request which can lead to unpredictable results
   tika_1               | WARN  Both 
org.apache.tika.server.resource.TikaResource#getXML and 
org.apache.tika.server.resource.TikaResource#getText are equal candidates for 
handling the current request which can lead to unpredictable results
   tika_1               | INFO  tika (application/pdf)
   tika_1               | WARN  The directory /tmp has very little usable 
temporary space.  Operations requiring temporary files may fail.
   ```
   When I attempt an extraction I get nothing
   `curl -X PUT --data-binary @/Users/lmcgibbn/Desktop/applsci-10-01127.pdf 
http://localhost:9998/tika --header "Content-type: application/pdf"`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to