Fixed!

finally fixed it! 2 issues:

Needed startDocument and endDocument in the handler - that fixed the JSON and 
in turn ended up fixing the REST and script based Tensorflow calls.
The often come up (but still undocumented we need to fix that!) problem that 
you can't concurrently mess with the metadata object whilst doing the 
ContentHandler stuff. You have to have an ImmutableMetadata object by the time 
you do ContentHandler stuff.
I'm going to do a few more tests then get this committed! Great work 
@thammegowda. Overall this is an amazing contribution it will be awesome for 
Tika users!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect, Instrument Software and Science Data Systems Section (398)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


On 8/14/16, 10:15 AM, "Chris Mattmann" <[email protected]> wrote:

    Hi Devs,
    
    Here’s what I’m seeing in TIKA-1993 and 1508, which I would love to finish 
today.
    
    1. Tensorflow python script works great.
    2. Tensorflow REST service – Docker container works (had to upgrade Docker 
to latest)
    3. Tensorflow REST service – Tika parser metadata works great.
    4. Tensorflow REST service – Tika XHTML won’t print or work.
    
    I can’t get the XHTML to print with the tika app –x flag (though –m 
produces the following):
    
    LMC-053601:tika1.14 mattmann$ java -cp 
tika-app/target/tika-app-1.14-SNAPSHOT.jar org.apache.tika.cli.TikaCLI 
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
 -m tika-parsers/src/test/resources/test-documents/testJPEG.jpg
    INFO  Available = true, API Status = HTTP/1.0 200 OK
    INFO  minConfidence = 0.015, topN=7
    INFO  Recogniser = 
org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
    INFO  Recogniser Available = true
    Content-Length: 7686
    Content-Type: image/jpeg
    OBJECT: Egyptian cat (0.09168)
    OBJECT: Border collie (0.07553)
    OBJECT: bluetick (0.06043)
    OBJECT: collie (0.02982)
    OBJECT: English foxhound (0.02759)
    OBJECT: Siamese cat, Siamese (0.02053)
    OBJECT: tabby, tabby cat (0.01826)
    X-Parsed-By: org.apache.tika.parser.CompositeParser
    X-Parsed-By: org.apache.tika.parser.recognition.ObjectRecognitionParser
    org.apache.tika.parser.recognition.object.rec.impl: 
org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
    resourceName: testJPEG.jpg
    LMC-053601:tika1.14 mattmann$ 
    
    Thoughts? @Thamme?
    
    Cheers,
    Chris
    
    
    
    

Reply via email to