Tommaso Teofili created JOSHUA-340: -------------------------------------- Summary: Revamp Tokenization and Normalization Key: JOSHUA-340 URL: https://issues.apache.org/jira/browse/JOSHUA-340 Project: Joshua Issue Type: Task Components: core, pipeline Reporter: Tommaso Teofili
As part of the preprocessing, Joshua tokenizes input sentences, for example splitting punctuation off from words. This is currently done with a set of Perl preprocessing scripts [1], but it would be nice to move this to the decoder itself. [1] : https://github.com/apache/incubator-joshua/tree/master/scripts/preparation -- This message was sent by Atlassian JIRA (v7.6.3#76005)