Tyler Palsulich created TIKA-1328:
-------------------------------------

             Summary: Translate Metadata and Content
                 Key: TIKA-1328
                 URL: https://issues.apache.org/jira/browse/TIKA-1328
             Project: Tika
          Issue Type: New Feature
            Reporter: Tyler Palsulich
             Fix For: 1.7


Right now, Translation is only done on Strings. Ideally, users would be able to 
"turn on" translation while parsing. I can think of a couple options:

- Make a TranslateAutoDetectParser. Automatically detect the file type, parse 
it, then translate the content.
- Make a Context switch. When true, translate the content regardless of the 
parser used. I'm not sure the best way to go about this method, but I prefer it 
over another Parser.

Regardless, we need a black or white list for translation. I think black list 
would be the way to go -- which fields should not be translated (dates, 
versions, ...) Any ideas? Also, somewhat unrelated, does anyone know of any 
other open source translation libraries? If we were really lucky, it wouldn't 
depend on an online service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to