Got it prof. Lessons learned for the next parsers.
Thanks, ~ Thamme -- *Thamme Gowda * Grad Student at USC <http://usc.edu> @thammegowda <https://twitter.com/thammegowda> | 213-536-3552 http://scf.usc.edu/~tnarayan/ 2016-08-14 11:29 GMT-07:00 Mattmann, Chris A (3980) < [email protected]>: > Fixed! > > finally fixed it! 2 issues: > > Needed startDocument and endDocument in the handler - that fixed the JSON > and in turn ended up fixing the REST and script based Tensorflow calls. > The often come up (but still undocumented we need to fix that!) problem > that you can't concurrently mess with the metadata object whilst doing the > ContentHandler stuff. You have to have an ImmutableMetadata object by the > time you do ContentHandler stuff. > I'm going to do a few more tests then get this committed! Great work > @thammegowda. Overall this is an amazing contribution it will be awesome > for Tika users! > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect, Instrument Software and Science Data Systems Section (398) > Manager, Open Source Projects Formulation and Development Office (8212) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Director, Information Retrieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > On 8/14/16, 10:15 AM, "Chris Mattmann" <[email protected]> wrote: > > Hi Devs, > > Here’s what I’m seeing in TIKA-1993 and 1508, which I would love to > finish today. > > 1. Tensorflow python script works great. > 2. Tensorflow REST service – Docker container works (had to upgrade > Docker to latest) > 3. Tensorflow REST service – Tika parser metadata works great. > 4. Tensorflow REST service – Tika XHTML won’t print or work. > > I can’t get the XHTML to print with the tika app –x flag (though –m > produces the following): > > LMC-053601:tika1.14 mattmann$ java -cp > tika-app/target/tika-app-1.14-SNAPSHOT.jar > org.apache.tika.cli.TikaCLI --config=tika-parsers/src/ > test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml > -m tika-parsers/src/test/resources/test-documents/testJPEG.jpg > INFO Available = true, API Status = HTTP/1.0 200 OK > INFO minConfidence = 0.015, topN=7 > INFO Recogniser = org.apache.tika.parser.recognition.tf. > TensorflowRESTRecogniser > INFO Recogniser Available = true > Content-Length: 7686 > Content-Type: image/jpeg > OBJECT: Egyptian cat (0.09168) > OBJECT: Border collie (0.07553) > OBJECT: bluetick (0.06043) > OBJECT: collie (0.02982) > OBJECT: English foxhound (0.02759) > OBJECT: Siamese cat, Siamese (0.02053) > OBJECT: tabby, tabby cat (0.01826) > X-Parsed-By: org.apache.tika.parser.CompositeParser > X-Parsed-By: org.apache.tika.parser.recognition. > ObjectRecognitionParser > org.apache.tika.parser.recognition.object.rec.impl: > org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser > resourceName: testJPEG.jpg > LMC-053601:tika1.14 mattmann$ > > Thoughts? @Thamme? > > Cheers, > Chris > > > > > >
