Fixed! finally fixed it! 2 issues:
Needed startDocument and endDocument in the handler - that fixed the JSON and in turn ended up fixing the REST and script based Tensorflow calls. The often come up (but still undocumented we need to fix that!) problem that you can't concurrently mess with the metadata object whilst doing the ContentHandler stuff. You have to have an ImmutableMetadata object by the time you do ContentHandler stuff. I'm going to do a few more tests then get this committed! Great work @thammegowda. Overall this is an amazing contribution it will be awesome for Tika users! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect, Instrument Software and Science Data Systems Section (398) Manager, Open Source Projects Formulation and Development Office (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 8/14/16, 10:15 AM, "Chris Mattmann" <[email protected]> wrote: Hi Devs, Here’s what I’m seeing in TIKA-1993 and 1508, which I would love to finish today. 1. Tensorflow python script works great. 2. Tensorflow REST service – Docker container works (had to upgrade Docker to latest) 3. Tensorflow REST service – Tika parser metadata works great. 4. Tensorflow REST service – Tika XHTML won’t print or work. I can’t get the XHTML to print with the tika app –x flag (though –m produces the following): LMC-053601:tika1.14 mattmann$ java -cp tika-app/target/tika-app-1.14-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml -m tika-parsers/src/test/resources/test-documents/testJPEG.jpg INFO Available = true, API Status = HTTP/1.0 200 OK INFO minConfidence = 0.015, topN=7 INFO Recogniser = org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser INFO Recogniser Available = true Content-Length: 7686 Content-Type: image/jpeg OBJECT: Egyptian cat (0.09168) OBJECT: Border collie (0.07553) OBJECT: bluetick (0.06043) OBJECT: collie (0.02982) OBJECT: English foxhound (0.02759) OBJECT: Siamese cat, Siamese (0.02053) OBJECT: tabby, tabby cat (0.01826) X-Parsed-By: org.apache.tika.parser.CompositeParser X-Parsed-By: org.apache.tika.parser.recognition.ObjectRecognitionParser org.apache.tika.parser.recognition.object.rec.impl: org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser resourceName: testJPEG.jpg LMC-053601:tika1.14 mattmann$ Thoughts? @Thamme? Cheers, Chris
