Chris A. Mattmann created TIKA-2369:
---------------------------------------
Summary: Define a clean Recogniser interface: for objects from
binary data; and for text classification
Key: TIKA-2369
URL: https://issues.apache.org/jira/browse/TIKA-2369
Project: Tika
Issue Type: Bug
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Fix For: 1.16
As described in TIKA-2360 we should refactor the ObjectRecogniser interface. I
propose creating:
1. TextRecogniser (per [~thammegowda] it takes INPUT:text input and OUTPUT:set
of metadata key values)
2. ObjectRecogniser (also per Thamme ObjectRecogniser, VideoLabeller, OCR,
Caption - INPUT:raw bytes and OUTPUT:set of metadata key values.)
We should of course rectify this with Tika-DL and how that folds in.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)