Chris A. Mattmann created TIKA-2369:
---------------------------------------

             Summary: Define a clean Recogniser interface: for objects from 
binary data; and for text classification
                 Key: TIKA-2369
                 URL: https://issues.apache.org/jira/browse/TIKA-2369
             Project: Tika
          Issue Type: Bug
            Reporter: Chris A. Mattmann
            Assignee: Chris A. Mattmann
             Fix For: 1.16


As described in TIKA-2360 we should refactor the ObjectRecogniser interface. I 
propose creating:

1. TextRecogniser (per [~thammegowda] it takes INPUT:text input and OUTPUT:set 
of metadata key values)
2. ObjectRecogniser (also per Thamme ObjectRecogniser, VideoLabeller, OCR, 
Caption - INPUT:raw bytes and OUTPUT:set of metadata key values.)

We should of course rectify this with Tika-DL and how that folds in. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to