More info, here’s what –J produces: LMC-053601:tika1.14 mattmann$ java -cp tika-app/target/tika-app-1.14-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml -J tika-parsers/src/test/resources/test-documents/testJPEG.jpg INFO Available = true, API Status = HTTP/1.0 200 OK INFO minConfidence = 0.015, topN=7 INFO Recogniser = org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser INFO Recogniser Available = true Exception in thread "main" org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml not declared at org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:62) at org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getQName(ToXMLContentHandler.java:68) at org.apache.tika.sax.ToXMLContentHandler.startElement(ToXMLContentHandler.java:148) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.lazyStartHead(XHTMLContentHandler.java:140) at org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:158) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:247) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:291) at org.apache.tika.parser.recognition.ObjectRecognitionParser.parse(ObjectRecognitionParser.java:125) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158) at org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:500) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:475) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145) LMC-053601:tika1.14 mattmann$
On 8/14/16, 10:15 AM, "Chris Mattmann" <[email protected]> wrote: Hi Devs, Here’s what I’m seeing in TIKA-1993 and 1508, which I would love to finish today. 1. Tensorflow python script works great. 2. Tensorflow REST service – Docker container works (had to upgrade Docker to latest) 3. Tensorflow REST service – Tika parser metadata works great. 4. Tensorflow REST service – Tika XHTML won’t print or work. I can’t get the XHTML to print with the tika app –x flag (though –m produces the following): LMC-053601:tika1.14 mattmann$ java -cp tika-app/target/tika-app-1.14-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml -m tika-parsers/src/test/resources/test-documents/testJPEG.jpg INFO Available = true, API Status = HTTP/1.0 200 OK INFO minConfidence = 0.015, topN=7 INFO Recogniser = org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser INFO Recogniser Available = true Content-Length: 7686 Content-Type: image/jpeg OBJECT: Egyptian cat (0.09168) OBJECT: Border collie (0.07553) OBJECT: bluetick (0.06043) OBJECT: collie (0.02982) OBJECT: English foxhound (0.02759) OBJECT: Siamese cat, Siamese (0.02053) OBJECT: tabby, tabby cat (0.01826) X-Parsed-By: org.apache.tika.parser.CompositeParser X-Parsed-By: org.apache.tika.parser.recognition.ObjectRecognitionParser org.apache.tika.parser.recognition.object.rec.impl: org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser resourceName: testJPEG.jpg LMC-053601:tika1.14 mattmann$ Thoughts? @Thamme? Cheers, Chris
