lewismc edited a comment on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-848865170
@philipsoutham I can replicate the build and test results
```
404c9ade89429296fb846060d2d3f13105b6f9b1bf8d96e9998fae304470c863
Image: apache/tika:1.26 - Passed
1.26
1.26
d0bb05c60afff50ed8d6c84995984dc7d8ecd0cedce8e044f9f60470bcc4aac9
Image: apache/tika:1.26-full - Passed
1.26-full
1.26-full
```
I experienced NO issues with the docker compositions for
`docker-compose-tika-customocr.yml`, `docker-compose-tika-grobid.yml` or
`docker-compose-tika-ner.yml`.
I was **UNABLE** to reproduce the issue you describe above regarding the NER
example. Can you provide more detail so I can reproduce?
I did experience an issue with `docker-compose-tika-vision.yml`
```
Attaching to inception-caption_1, inception-rest_1, inception-video_1, tika_1
inception-caption_1 | INFO:tensorflow:Building model.
inception-video_1 | 2021-05-26 15:32:50.721748: I
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
AVX AVX2 FMA
inception-caption_1 | INFO:tensorflow:Initializing vocabulary from file:
/usr/share/apache-tika/models/dl/image/caption/1M_iters_ckpt/word_counts.txt
inception-caption_1 | INFO:tensorflow:Created vocabulary with 11520 words
inception-caption_1 | 2021-05-26 15:32:51.113950: I
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
AVX AVX2 FMA
inception-caption_1 | INFO:tensorflow:Loading model from checkpoint:
/usr/share/apache-tika/models/dl/image/caption/1M_iters_ckpt/model.ckpt-1000000
inception-caption_1 | INFO:tensorflow:Restoring parameters from
/usr/share/apache-tika/models/dl/image/caption/1M_iters_ckpt/model.ckpt-1000000
inception-video_1 | ('cv2.__version__', '3.2.0')
inception-video_1 | Serving on port 8764
inception-video_1 | * Serving Flask app "inceptionapi" (lazy loading)
inception-video_1 | * Environment: production
inception-video_1 | WARNING: Do not use the development server in a
production environment.
inception-video_1 | Use a production WSGI server instead.
inception-video_1 | * Debug mode: off
inception-video_1 | * Running on http://0.0.0.0:8764/ (Press CTRL+C to
quit)
inception-caption_1 | INFO:tensorflow:Successfully loaded checkpoint:
model.ckpt-1000000
inception-caption_1 | Serving on port 8764
inception-caption_1 | * Serving Flask app "im2txtapi" (lazy loading)
inception-caption_1 | * Environment: production
inception-caption_1 | WARNING: Do not use the development server in a
production environment.
inception-caption_1 | Use a production WSGI server instead.
inception-caption_1 | * Debug mode: off
inception-caption_1 | * Running on http://0.0.0.0:8764/ (Press CTRL+C to
quit)
tika_1 | May 26, 2021 3:32:53 PM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
tika_1 | WARNING: J2KImageReader not loaded. JPEG2000 files
will not be processed.
tika_1 | See
https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
tika_1 | for optional dependencies.
tika_1 |
tika_1 | May 26, 2021 3:32:53 PM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
tika_1 | WARNING: Tesseract OCR is installed and will be
automatically applied to image files unless
tika_1 | you've excluded the TesseractOCRParser from the
default parser.
tika_1 | Tesseract may dramatically slow down content
extraction (TIKA-2359).
tika_1 | As of Tika 1.15 (and prior versions), Tesseract is
automatically called.
tika_1 | In future versions of Tika, users may need to turn
the TesseractOCRParser on via TikaConfig.
tika_1 | May 26, 2021 3:32:53 PM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
tika_1 | WARNING: org.xerial's sqlite-jdbc is not loaded.
tika_1 | Please provide the jar on your classpath to parse
sqlite files.
tika_1 | See tika-parsers/pom.xml for the correct version.
tika_1 | INFO Starting Apache Tika 1.25 server
tika_1 | INFO Using custom config: /tika-config.xml
inception-caption_1 | 172.23.0.5 - - [26/May/2021 15:32:53] "GET
/inception/v3/ping HTTP/1.1" 200 -
tika_1 | INFO Available = true, API Status = HTTP/1.0 200 OK
tika_1 | INFO Captions = 5, MaxCaptionLength = 15
tika_1 | INFO Recogniser =
org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
tika_1 | INFO Recogniser Available = true
tika_1 | INFO Setting the server's publish address to be
http://0.0.0.0:9998/
tika_1 | INFO Logging initialized @1803ms to
org.eclipse.jetty.util.log.Slf4jLog
tika_1 | INFO jetty-9.4.33.v20201020; built:
2020-10-20T23:39:24.803Z; git: 1be68755656cef678b79a2ef1c2ebbca99e25420; jvm
14.0.2+12-Ubuntu-120.04
tika_1 | INFO Started ServerConnector@1d8e2eea{HTTP/1.1,
(http/1.1)}{0.0.0.0:9998}
tika_1 | INFO Started @1870ms
tika_1 | WARN Empty contextPath
tika_1 | INFO Started
o.e.j.s.h.ContextHandler@48b0e701{/,null,AVAILABLE}
tika_1 | INFO Started Apache Tika server at
http://0.0.0.0:9998/
inception-rest_1 | 2021-05-26 15:32:54.352046: I
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
AVX AVX2 FMA
inception-rest_1 | Can't import video libraries, No video functionality
is available
inception-rest_1 | Serving on port 8764
inception-rest_1 | * Serving Flask app "inceptionapi" (lazy loading)
inception-rest_1 | * Environment: production
inception-rest_1 | WARNING: Do not use the development server in a
production environment.
inception-rest_1 | Use a production WSGI server instead.
inception-rest_1 | * Debug mode: off
inception-rest_1 | * Running on http://0.0.0.0:8764/ (Press CTRL+C to
quit)
tika_1 | WARN Both
org.apache.tika.server.resource.TikaResource#getHTML and
org.apache.tika.server.resource.TikaResource#getText are equal candidates for
handling the current request which can lead to unpredictable results
tika_1 | WARN Both
org.apache.tika.server.resource.TikaResource#getXML and
org.apache.tika.server.resource.TikaResource#getText are equal candidates for
handling the current request which can lead to unpredictable results
tika_1 | INFO tika (application/pdf)
tika_1 | WARN The directory /tmp has very little usable
temporary space. Operations requiring temporary files may fail.
```
When I attempt an extraction I get nothing
`curl -X PUT --data-binary @/Users/lmcgibbn/Desktop/applsci-10-01127.pdf
http://localhost:9998/tika --header "Content-type: application/pdf"`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]