Tyler Palsulich created TIKA-1328:
-------------------------------------
Summary: Translate Metadata and Content
Key: TIKA-1328
URL: https://issues.apache.org/jira/browse/TIKA-1328
Project: Tika
Issue Type: New Feature
Reporter: Tyler Palsulich
Fix For: 1.7
Right now, Translation is only done on Strings. Ideally, users would be able to
"turn on" translation while parsing. I can think of a couple options:
- Make a TranslateAutoDetectParser. Automatically detect the file type, parse
it, then translate the content.
- Make a Context switch. When true, translate the content regardless of the
parser used. I'm not sure the best way to go about this method, but I prefer it
over another Parser.
Regardless, we need a black or white list for translation. I think black list
would be the way to go -- which fields should not be translated (dates,
versions, ...) Any ideas? Also, somewhat unrelated, does anyone know of any
other open source translation libraries? If we were really lucky, it wouldn't
depend on an online service.
--
This message was sent by Atlassian JIRA
(v6.2#6252)