Repository: opennlp Updated Branches: refs/heads/master db12fcb72 -> 09ffea037
Updated README for 1.8.1 release Project: http://git-wip-us.apache.org/repos/asf/opennlp/repo Commit: http://git-wip-us.apache.org/repos/asf/opennlp/commit/09ffea03 Tree: http://git-wip-us.apache.org/repos/asf/opennlp/tree/09ffea03 Diff: http://git-wip-us.apache.org/repos/asf/opennlp/diff/09ffea03 Branch: refs/heads/master Commit: 09ffea037a46a5b578cd4181541966b73f6e491f Parents: db12fcb Author: smarthi <[email protected]> Authored: Sat Jul 1 09:18:15 2017 -0400 Committer: smarthi <[email protected]> Committed: Sat Jul 1 09:18:15 2017 -0400 ---------------------------------------------------------------------- opennlp-distr/README | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/opennlp/blob/09ffea03/opennlp-distr/README ---------------------------------------------------------------------- diff --git a/opennlp-distr/README b/opennlp-distr/README index 7f9bc4d..0db6463 100644 --- a/opennlp-distr/README +++ b/opennlp-distr/README @@ -24,21 +24,17 @@ removed. Java 1.8 is required. Additionally the release contains the following noteworthy changes: -- POS Tagger context generator now supports feature generation XML -- Add a Name Finder feature generator that adds POS Tag features -- Add CONLL-U format support -- Improve default Name Finder settings -- TokenNameFinderEvaluator CLI now support nameTypes argument -- Stupid backoff is now the default in NGramLanguageModel -- Language codes now are ISO 639-3 compliant -- Add many unit tests -- Distribution package now includes example parameters file -- Now prefix and suffix feature generators are configurable -- Remove API in Document Categorizer for user specified tokenizer -- Learnable lemmatizer now returns all possible lemmas for a given word and pos tag -- Lemmatizer API backward compatibility break: no need to encode/decode lemmas anymore, now LemmatizerME lemmatize method returns the actual lemma -- Add stemmer, detokenizer and sentence detection abbreviations for Irish -- Chunker SequenceValidator signature changed to allow access to both token and POS tag +- A new Language Detection Component +- Support for Irish Sentence Bank formats +- Support to train the sentence detector and tokenizer on the UD corpus +- Evaluation tests now support ISO-639-3 language codes +- Convenience methods to load models from a path +- Refactored the Data Indexer Code +- Optimized NGram creation loop to better leverage CPU cache +- Refactored BratNameSampleStream +- Remove deprecated code from util package +- Redesigned web site - https://opennlp.apache.org +- New logo for the project A detailed list of the issues related to this release can be found in the release notes.
