Tyler Palsulich created TIKA-1328: ------------------------------------- Summary: Translate Metadata and Content Key: TIKA-1328 URL: https://issues.apache.org/jira/browse/TIKA-1328 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Fix For: 1.7
Right now, Translation is only done on Strings. Ideally, users would be able to "turn on" translation while parsing. I can think of a couple options: - Make a TranslateAutoDetectParser. Automatically detect the file type, parse it, then translate the content. - Make a Context switch. When true, translate the content regardless of the parser used. I'm not sure the best way to go about this method, but I prefer it over another Parser. Regardless, we need a black or white list for translation. I think black list would be the way to go -- which fields should not be translated (dates, versions, ...) Any ideas? Also, somewhat unrelated, does anyone know of any other open source translation libraries? If we were really lucky, it wouldn't depend on an online service. -- This message was sent by Atlassian JIRA (v6.2#6252)