This is an automated email from the ASF dual-hosted git repository. mawiesne pushed a change to branch OPENNLP-1719-Add-additional-ITs-for-verification-of-UD-POS-models in repository https://gitbox.apache.org/repos/asf/opennlp.git
discard 21c4c7d0 - adds three tagged sentences for the Polish language, tagged and community provided by: @alsmolarczyk, native speaker of the Polish language discard 69b59e69 adds POSTaggerMEIT with a sample sentences for - EN (dev-manual) - CA (by @kinow) - DE (ud data) - PT (ud data) add 1d722007 OPENNLP-1660: Switch to pre-trained UD models in Dev Manual (#702) add 8b84a3e7 OPENNLP-1663 Add test for FileToByteArraySampleStream (#706) add f4de6c23 OPENNLP-1662: Wrap thread-safe classes in try-with resources in Eval test (#705) add e91ceb17 OPENNLP-1661: Fix custom models being wiped from OpenNLP user.home directory (#704) add e6ef2b50 [maven-release-plugin] prepare release opennlp-2.5.1 add 1562f749 [maven-release-plugin] prepare for next development iteration add 5310565f OPENNLP-1667: Add thread-safe version of ChunkerME (#708) add a3a5e0fb OPENNLP-1668: Avoid multiple DecimalFormat instances in AbstractModel (#710) add 7d4b450f OPENNLP-1669: Improve JavaDoc of QN related classes (#709) add 9ba57ce6 OPENNLP-1670: Disable releases for apache.snapshots repo add fb3b3f7d OPENNLP-1671: Convert while loops with duplicated code to do-while loops (#712) add 3fd4fd1c OPENNLP-1673: Re-use static conversion methods in ArrayMath (#714) add 34ca5182 OPENNLP-1674: Make use of enhanced switch expression introduced in Java 14 (#715) add b2d954a3 OPENNLP-1672: Flip misordered assertEquals arguments in several tests (#713) add 2f2f631c OPENNLP-1675: Address ShellCheck warnings for shell scripts (#716) add 49678c37 OPENNLP-1677: Extend JavaDoc of POSTaggerME (#717) add 5b846a30 OPENNLP-1679: Extend JavaDoc of SgmlParser (#719) add 74486145 OPENNLP-1678: Add thread-safe version of LanguageDetectorME (#718) add 6b7d87b1 OPENNLP-1681: Update log4j2 to 2.24.3 (#723) add 23dd6cee OPENNLP-1682: Update JUnit to 5.11.4 (#726) add 3fd914f9 OPENNLP-1683: Update Uimaj to 3.6.0 (#725) add ed2682cc OPENNLP-1447: Reenable Cmdline Tool execution tests (#720) add 818a333f OPENNLP-1680: Update several Maven plugins to recent versions (#729) add 1a50db3c OpenNLP 2.5.2 (#730) add b9f07123 OPENNLP-1684: Reduce creation of String instances in BrownBigramFeatureGenerator (#731) add b690315b OPENNLP-1685: Adapt bin.xml assembly descriptor to include generated JavaDoc (#732) add a81c162f OPENNLP-1687: Remove quotes around $HEAP in opennlp tools shell script (#734) add 784018f4 OPENNLP-1686: Adjust GH CI config to build with Java 24-ea add 75e1df61 OPENNLP-1689 - Update GH actions with ASF #builds security recommendations add 4a5b7af2 [maven-release-plugin] prepare release opennlp-2.5.3 add e86b47fa [maven-release-plugin] prepare for next development iteration add d3921d68 OPENNLP-1696: Update logcaptor to 2.10.1 (#738) add 2690b88d OPENNLP-1694: Enhance JavaDoc in util.featuregen package (#739) add 297dc9b2 OPENNLP-1697 - GH action fail because of 403 returned by sourceforge add 6d0d8c2c OPENNLP-1688 - Add GH action to test binaries (*nix + win) in GH actions add c5cc1f00 OPENNLP-1695: Add more tests for classes in formats package (#742) add c9440e68 OPENNLP-1702: BratDocumentStream should process files in bratCorpusDir deterministically - fix by sorting all candidate files from dir lexicographically - extracts constants where applicable add 4839a21a OPENNLP-1521: Add documentation to describe how to re-generate snowball stemmer code (#744) add 6daacd31 OPENNLP-1701: Re-generates Snowball Stemmer Code (#743) add dff06c80 OPENNLP-1704: Auto-generate NOTICE for OpenNLP Core Project (#746) add ac73a5cc Minor: Regenerated NOTICE File for dff06c80b45e45c9b5f09a65426f52e1b283d013 add 9d1dfa96 OPENNLP-385: Add unit tests for OpenNLP UIMA component (#748) add 59df7a7d OPENNLP-1705: Update JUnit to 5.12.0 (#749) add 4fae1f55 Bump slf4j.version from 2.0.16 to 2.0.17 (#750) add aa2d4811 Minor: Regenerated NOTICE File for 4fae1f557a14d7096fe7c762b88c9601ba6ee485 add 1016b178 OPENNLP-1707: Update ONNX Runtime to 1.21.0 (#752) add d7e097de Minor: Regenerated NOTICE File for 1016b178213c437fd63bf12cc7134a95739cff20 add 7338b1b7 OPENNLP-1705: Update JUnit to 5.12.1 (#755) add 4e2e7f26 OPENNLP-287: Extend POS Tagger documentation with more information about the tag dictionary (#754) add 9470034a adds POSTaggerMEIT with a sample sentences for - EN (dev-manual) - CA (by @kinow) - DE (ud data) - PT (ud data) add 235f5035 - adds three tagged sentences for the Polish language, tagged and community provided by: @alsmolarczyk, native speaker of the Polish language This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (21c4c7d0) \ N -- N -- N refs/heads/OPENNLP-1719-Add-additional-ITs-for-verification-of-UD-POS-models (235f5035) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .github/workflows/license.yml | 70 +++++ .github/workflows/maven.yml | 4 +- .github/workflows/publish-snapshots.yml | 2 + .github/workflows/shell-tests.yml | 179 +++++++++++ NOTICE | 69 ++--- README.md | 4 + dev/Snowball-Stemmer.md | 82 ++++++ opennlp-distr/pom.xml | 2 +- opennlp-distr/src/main/assembly/bin.xml | 10 +- opennlp-distr/src/main/bin/opennlp | 20 +- opennlp-distr/src/test/ps/test_opennlp.Tests.ps1 | 66 +++++ opennlp-distr/src/test/sh/test_opennlp.bats | 52 ++++ opennlp-dl-gpu/pom.xml | 2 +- opennlp-dl/pom.xml | 2 +- opennlp-docs/pom.xml | 2 +- opennlp-docs/src/docbkx/langdetect.xml | 4 +- opennlp-docs/src/docbkx/lemmatizer.xml | 116 ++++---- opennlp-docs/src/docbkx/postagger.xml | 132 ++++++++- opennlp-docs/src/docbkx/sentdetect.xml | 17 +- opennlp-docs/src/docbkx/tokenizer.xml | 22 +- opennlp-morfologik-addon/pom.xml | 2 +- .../src/main/bin/morfologik-addon | 20 +- opennlp-tools-models/pom.xml | 2 +- opennlp-tools/pom.xml | 17 +- .../opennlp/tools/chunker/ChunkSampleStream.java | 12 +- .../ThreadSafeChunkerME.java} | 57 ++-- .../cmdline/tokenizer/TokenizerTrainerTool.java | 12 +- .../tools/formats/AbstractSampleStreamFactory.java | 48 +++ .../tools/formats/BioNLP2004NameSampleStream.java | 38 ++- .../formats/BioNLP2004NameSampleStreamFactory.java | 30 +- .../tools/formats/ChunkerSampleStreamFactory.java | 28 +- .../tools/formats/Conll02NameSampleStream.java | 23 +- .../formats/Conll02NameSampleStreamFactory.java | 13 +- .../formats/Conll03NameSampleStreamFactory.java | 12 +- .../formats/ConllXPOSSampleStreamFactory.java | 26 +- .../formats/ConllXSentenceSampleStreamFactory.java | 16 +- .../formats/ConllXTokenSampleStreamFactory.java | 14 +- .../tools/formats/DocumentSampleStreamFactory.java | 27 +- .../tools/formats/EvalitaNameSampleStream.java | 74 +++-- .../formats/EvalitaNameSampleStreamFactory.java | 13 +- .../LanguageDetectorSampleStreamFactory.java | 31 +- .../formats/LemmatizerSampleStreamFactory.java | 28 +- .../tools/formats/NameSampleDataStreamFactory.java | 29 +- .../tools/formats/ParseSampleStreamFactory.java | 26 +- .../tools/formats/SentenceSampleStreamFactory.java | 28 +- .../tools/formats/TokenSampleStreamFactory.java | 28 +- .../tools/formats/TwentyNewsgroupSampleStream.java | 9 + .../TwentyNewsgroupSampleStreamFactory.java | 54 ++-- .../tools/formats/WordTagSampleStreamFactory.java | 27 +- .../tools/formats/ad/ADChunkSampleStream.java | 4 +- .../formats/ad/ADChunkSampleStreamFactory.java | 32 +- .../tools/formats/ad/ADNameSampleStream.java | 4 +- .../formats/ad/ADNameSampleStreamFactory.java | 32 +- .../tools/formats/ad/ADPOSSampleStream.java | 2 +- .../tools/formats/ad/ADPOSSampleStreamFactory.java | 33 +-- .../tools/formats/ad/ADSentenceSampleStream.java | 15 +- .../formats/ad/ADSentenceSampleStreamFactory.java | 34 +-- .../opennlp/tools/formats/ad/ADSentenceStream.java | 47 +-- .../formats/ad/ADTokenSampleStreamFactory.java | 12 +- .../formats/ad/PortugueseContractionUtility.java | 9 +- .../formats/brat/AnnotationConfiguration.java | 114 ++++--- .../tools/formats/brat/BratAnnotationStream.java | 11 +- .../opennlp/tools/formats/brat/BratDocument.java | 35 +-- .../tools/formats/brat/BratDocumentStream.java | 29 +- .../tools/formats/brat/BratNameSampleStream.java | 44 ++- .../formats/brat/BratNameSampleStreamFactory.java | 47 ++- .../tools/formats/brat/SegmenterObjectStream.java | 1 + .../conllu/ConlluLemmaSampleStreamFactory.java | 32 +- .../conllu/ConlluPOSSampleStreamFactory.java | 33 +-- .../tools/formats/conllu/ConlluSentence.java | 4 +- .../conllu/ConlluSentenceSampleStreamFactory.java | 16 +- .../conllu/ConlluTokenSampleStreamFactory.java | 17 +- .../tools/formats/conllu/ConlluWordLine.java | 13 +- .../convert/NameToSentenceSampleStreamFactory.java | 9 +- .../convert/NameToTokenSampleStreamFactory.java | 11 +- .../convert/POSToSentenceSampleStreamFactory.java | 11 +- .../convert/POSToTokenSampleStreamFactory.java | 11 +- .../convert/ParseToPOSSampleStreamFactory.java | 19 +- .../ParseToSentenceSampleStreamFactory.java | 18 +- .../convert/ParseToTokenSampleStreamFactory.java | 17 +- .../frenchtreebank/ConstitDocumentHandler.java | 3 +- .../ConstitParseSampleStreamFactory.java | 19 +- .../IrishSentenceBankSentenceStreamFactory.java | 21 +- .../IrishSentenceBankTokenSampleStreamFactory.java | 21 +- .../LeipzigLanguageSampleStreamFactory.java | 32 +- .../tools/formats/leipzig/SampleShuffleStream.java | 17 +- .../tools/formats/leipzig/SampleSkipStream.java | 14 +- .../letsmt/LetsmtSentenceStreamFactory.java | 22 +- .../formats/masc/{package-info.java => Masc.java} | 10 +- .../opennlp/tools/formats/masc/MascDocument.java | 1 - .../masc/MascNamedEntitySampleStreamFactory.java | 53 ++-- .../formats/masc/MascPOSSampleStreamFactory.java | 52 ++-- .../opennlp/tools/formats/masc/MascSentence.java | 4 +- .../masc/MascSentenceSampleStreamFactory.java | 53 ++-- .../java/opennlp/tools/formats/masc/MascToken.java | 3 + .../formats/masc/MascTokenSampleStreamFactory.java | 54 ++-- .../java/opennlp/tools/formats/masc/MascWord.java | 3 + .../formats/moses/MosesSentenceSampleStream.java | 12 +- .../moses/MosesSentenceSampleStreamFactory.java | 35 +-- .../tools/formats/muc/DocumentSplitterStream.java | 2 +- .../formats/muc/Muc6NameSampleStreamFactory.java | 28 +- .../opennlp/tools/formats/muc/MucElementNames.java | 6 +- .../tools/formats/muc/MucNameSampleStream.java | 2 +- .../java/opennlp/tools/formats/muc/SgmlParser.java | 92 +++--- .../formats/nkjp/NKJPSegmentationDocument.java | 58 ++-- .../formats/nkjp/NKJPSentenceSampleStream.java | 4 +- .../nkjp/NKJPSentenceSampleStreamFactory.java | 23 +- .../tools/formats/nkjp/NKJPTextDocument.java | 6 +- .../ontonotes/OntoNotesNameSampleStream.java | 111 ++++--- .../OntoNotesNameSampleStreamFactory.java | 25 +- .../ontonotes/OntoNotesPOSSampleStreamFactory.java | 19 +- .../ontonotes/OntoNotesParseSampleStream.java | 2 +- .../OntoNotesParseSampleStreamFactory.java | 30 +- .../ThreadSafeLanguageDetectorME.java} | 47 +-- .../tools/lemmatizer/ThreadSafeLemmatizerME.java | 10 +- .../tools/ml/maxent/quasinewton/Function.java | 21 ++ .../tools/ml/maxent/quasinewton/LineSearch.java | 140 ++++----- .../ml/maxent/quasinewton/NegLogLikelihood.java | 4 +- .../quasinewton/ParallelNegLogLikelihood.java | 9 +- .../tools/ml/maxent/quasinewton/QNMinimizer.java | 72 +++-- .../tools/ml/maxent/quasinewton/QNModel.java | 11 +- .../tools/ml/maxent/quasinewton/QNTrainer.java | 38 ++- .../java/opennlp/tools/ml/model/AbstractModel.java | 23 +- .../opennlp/tools/ml/model/DataIndexerFactory.java | 24 +- .../tools/namefind/NameSampleDataStream.java | 10 +- .../tools/namefind/ThreadSafeNameFinderME.java | 20 +- .../opennlp/tools/parser/ParseSampleStream.java | 4 + .../java/opennlp/tools/postag/POSTaggerME.java | 74 +++-- .../opennlp/tools/postag/WordTagSampleStream.java | 12 +- .../sentdetect/DefaultSDContextGenerator.java | 12 +- .../tools/sentdetect/SentenceDetectorME.java | 12 +- .../java/opennlp/tools/stemmer/PorterStemmer.java | 17 +- .../stemmer/snowball/AbstractSnowballStemmer.java | 5 + .../java/opennlp/tools/stemmer/snowball/Among.java | 43 ++- .../tools/stemmer/snowball/SnowballProgram.java | 236 +++++++++------ .../tools/stemmer/snowball/arabicStemmer.java | 123 ++++---- .../tools/stemmer/snowball/catalanStemmer.java | 18 +- .../tools/stemmer/snowball/danishStemmer.java | 18 +- .../tools/stemmer/snowball/dutchStemmer.java | 24 +- .../tools/stemmer/snowball/englishStemmer.java | 38 +-- .../tools/stemmer/snowball/finnishStemmer.java | 49 ++-- .../tools/stemmer/snowball/frenchStemmer.java | 28 +- .../tools/stemmer/snowball/germanStemmer.java | 33 ++- .../tools/stemmer/snowball/greekStemmer.java | 326 +++++++++++---------- .../tools/stemmer/snowball/hungarianStemmer.java | 33 ++- .../tools/stemmer/snowball/indonesianStemmer.java | 45 +-- .../tools/stemmer/snowball/irishStemmer.java | 16 +- .../tools/stemmer/snowball/italianStemmer.java | 28 +- .../tools/stemmer/snowball/norwegianStemmer.java | 16 +- .../tools/stemmer/snowball/porterStemmer.java | 24 +- .../tools/stemmer/snowball/portugueseStemmer.java | 26 +- .../tools/stemmer/snowball/romanianStemmer.java | 22 +- .../tools/stemmer/snowball/russianStemmer.java | 24 +- .../tools/stemmer/snowball/spanishStemmer.java | 28 +- .../tools/stemmer/snowball/swedishStemmer.java | 18 +- .../tools/stemmer/snowball/turkishStemmer.java | 109 ++++--- .../java/opennlp/tools/tokenize/TokenizerME.java | 7 +- .../tools/tokenize/lang/en/TokenSampleStream.java | 21 +- .../main/java/opennlp/tools/util/DownloadUtil.java | 60 +++- .../AdditionalContextFeatureGenerator.java | 1 - .../featuregen/AggregatedFeatureGenerator.java | 24 +- .../AggregatedFeatureGeneratorFactory.java | 3 + .../featuregen/BigramNameFeatureGenerator.java | 5 + .../BigramNameFeatureGeneratorFactory.java | 6 + .../featuregen/BrownBigramFeatureGenerator.java | 27 +- .../tools/util/featuregen/BrownCluster.java | 9 +- .../BrownClusterBigramFeatureGeneratorFactory.java | 5 +- ...wnClusterTokenClassFeatureGeneratorFactory.java | 5 +- .../BrownClusterTokenFeatureGeneratorFactory.java | 5 +- .../BrownTokenClassFeatureGenerator.java | 7 +- .../tools/util/featuregen/BrownTokenClasses.java | 6 +- .../featuregen/BrownTokenFeatureGenerator.java | 13 +- .../util/featuregen/CachedFeatureGenerator.java | 2 + .../featuregen/CachedFeatureGeneratorFactory.java | 3 + .../featuregen/CharacterNgramFeatureGenerator.java | 8 +- .../CharacterNgramFeatureGeneratorFactory.java | 3 + .../DefinitionFeatureGeneratorFactory.java | 7 +- .../featuregen/DictionaryFeatureGenerator.java | 21 +- .../DictionaryFeatureGeneratorFactory.java | 3 + .../featuregen/DocumentBeginFeatureGenerator.java | 5 + .../DocumentBeginFeatureGeneratorFactory.java | 6 + .../FeatureGeneratorResourceProvider.java | 8 +- .../tools/util/featuregen/GeneratorFactory.java | 7 +- .../tools/util/featuregen/InSpanGenerator.java | 11 +- .../featuregen/OutcomePriorFeatureGenerator.java | 2 + .../featuregen/POSTaggerNameFeatureGenerator.java | 16 +- .../POSTaggerNameFeatureGeneratorFactory.java | 3 + .../util/featuregen/PosTaggerFeatureGenerator.java | 5 + .../PosTaggerFeatureGeneratorFactory.java | 6 + .../util/featuregen/PrefixFeatureGenerator.java | 21 +- .../featuregen/PrefixFeatureGeneratorFactory.java | 3 + .../featuregen/PreviousMapFeatureGenerator.java | 4 +- .../PreviousMapFeatureGeneratorFactory.java | 4 +- .../featuregen/PreviousTwoMapFeatureGenerator.java | 2 + .../util/featuregen/SentenceFeatureGenerator.java | 2 + .../SentenceFeatureGeneratorFactory.java | 3 + .../util/featuregen/SuffixFeatureGenerator.java | 21 +- .../featuregen/SuffixFeatureGeneratorFactory.java | 3 + .../featuregen/TokenClassFeatureGenerator.java | 14 +- .../TokenClassFeatureGeneratorFactory.java | 3 + .../util/featuregen/TokenFeatureGenerator.java | 6 +- .../featuregen/TokenFeatureGeneratorFactory.java | 6 + .../featuregen/TokenPatternFeatureGenerator.java | 3 + .../TokenPatternFeatureGeneratorFactory.java | 3 + .../featuregen/TrigramNameFeatureGenerator.java | 1 + .../TrigramNameFeatureGeneratorFactory.java | 6 + .../featuregen/WindowFeatureGeneratorFactory.java | 3 + .../featuregen/WordClusterFeatureGenerator.java | 1 + .../WordClusterFeatureGeneratorFactory.java | 4 +- .../opennlp/tools/EnabledWhenCDNAvailable.java | 30 +- .../opennlp/tools/chunker/ChunkSampleTest.java | 2 +- .../tools/chunker/ChunkerEvaluatorTest.java | 4 +- .../tools/cmdline/TokenNameFinderToolTest.java | 105 +++---- .../tokenizer/TokenizerTrainerToolTest.java | 109 ++++--- .../opennlp/tools/doccat/DocumentSampleTest.java | 2 +- .../opennlp/tools/eval/MultiThreadedToolsEval.java | 54 ++-- .../formats/AbstractSampleStreamFactoryTest.java | 68 +++++ .../tools/formats/AbstractSampleStreamTest.java | 3 +- .../BioNLP2004NameSampleStreamFactoryTest.java | 113 +++++++ .../formats/ChunkerSampleStreamFactoryTest.java | 79 +++++ .../Conll02NameSampleStreamFactoryTest.java | 127 ++++++++ .../Conll03NameSampleStreamFactoryTest.java | 127 ++++++++ .../formats/ConllXPOSSampleStreamFactoryTest.java | 78 +++++ .../ConllXSentenceSampleStreamFactoryTest.java | 99 +++++++ .../ConllXTokenSampleStreamFactoryTest.java | 98 +++++++ .../EvalitaNameSampleStreamFactoryTest.java | 107 +++++++ .../tools/formats/EvalitaNameSampleStreamTest.java | 88 ++++-- .../LanguageDetectorSampleStreamFactoryTest.java | 79 +++++ .../formats/LemmatizerSampleStreamFactoryTest.java | 79 +++++ .../formats/NameSampleDataStreamFactoryTest.java | 85 ++++++ .../formats/ParseSampleStreamFactoryTest.java | 80 +++++ .../formats/SentenceSampleStreamFactoryTest.java | 79 +++++ .../formats/TokenSampleStreamFactoryTest.java | 79 +++++ .../TwentyNewsgroupSampleStreamFactoryTest.java | 151 ++++++++++ .../formats/WordTagSampleStreamFactoryTest.java | 83 ++++++ .../formats/ad/ADChunkSampleStreamFactoryTest.java | 102 +++++++ .../formats/ad/ADPOSSampleStreamFactoryTest.java | 105 +++++++ .../tools/formats/ad/ADParagraphStreamTest.java | 2 +- .../ad/ADSentenceSampleStreamFactoryTest.java | 105 +++++++ .../formats/ad/ADTokenSampleStreamFactoryTest.java | 108 +++++++ .../tools/formats/ad/ADTokenSampleStreamTest.java | 6 +- .../formats/ad/AbstractADSampleStreamTest.java | 4 +- .../formats/brat/BratAnnotationStreamTest.java | 39 +-- .../tools/formats/brat/BratDocumentTest.java | 20 +- .../brat/BratNameSampleStreamFactoryTest.java | 167 +++++++++++ .../formats/brat/BratNameSampleStreamTest.java | 17 +- .../conllu/ConlluLemmaSampleStreamFactoryTest.java | 113 +++++++ .../conllu/ConlluPOSSampleStreamFactoryTest.java | 113 +++++++ .../ConlluSentenceSampleStreamFactoryTest.java | 99 +++++++ .../conllu/ConlluTokenSampleStreamFactoryTest.java | 82 ++++++ .../convert/AbstractConvertTest.java} | 32 +- .../FileToByteArraySampleStreamTest.java} | 21 +- .../FileToStringSampleStreamTest.java} | 24 +- .../NameToSentenceSampleStreamFactoryTest.java | 101 +++++++ .../NameToTokenSampleStreamFactoryTest.java | 101 +++++++ .../POSToSentenceSampleStreamFactoryTest.java | 101 +++++++ .../convert/POSToTokenSampleStreamFactoryTest.java | 101 +++++++ .../convert/ParseToPOSSampleStreamFactoryTest.java | 81 +++++ .../ParseToSentenceSampleStreamFactoryTest.java | 101 +++++++ .../ParseToTokenSampleStreamFactoryTest.java | 101 +++++++ .../ConstitParseSampleStreamFactoryTest.java | 93 ++++++ ...IrishSentenceBankSentenceStreamFactoryTest.java | 83 ++++++ ...shSentenceBankTokenSampleStreamFactoryTest.java | 83 ++++++ .../LeipzigLanguageSampleStreamFactoryTest.java | 110 +++++++ .../letsmt/LetsmtSentenceStreamFactoryTest.java | 84 ++++++ .../MascNamedEntitySampleStreamFactoryTest.java | 102 +++++++ .../masc/MascNamedEntitySampleStreamTest.java | 6 +- .../masc/MascPOSSampleStreamFactoryTest.java | 102 +++++++ .../formats/masc/MascPOSSampleStreamTest.java | 6 +- .../masc/MascSentenceSampleStreamFactoryTest.java | 102 +++++++ .../formats/masc/MascSentenceSampleStreamTest.java | 6 +- .../masc/MascTokenSampleStreamFactoryTest.java | 102 +++++++ .../formats/masc/MascTokenSampleStreamTest.java | 6 +- .../MosesSentenceSampleStreamFactoryTest.java | 82 ++++++ .../muc/Muc6NameSampleStreamFactoryTest.java | 114 +++++++ .../opennlp/tools/formats/muc/SgmlParserTest.java | 17 +- .../nkjp/NKJPSentenceSampleStreamFactoryTest.java | 102 +++++++ .../OntoNotesNameSampleStreamFactoryTest.java | 96 ++++++ .../OntoNotesPOSSampleStreamFactoryTest.java | 96 ++++++ .../OntoNotesParseSampleStreamFactoryTest.java | 102 +++++++ .../langdetect/LanguageDetectorEvaluatorTest.java | 2 +- .../tools/langdetect/LanguageSampleTest.java | 2 +- .../opennlp/tools/langdetect/LanguageTest.java | 2 +- .../opennlp/tools/lemmatizer/LemmaSampleTest.java | 2 +- .../ml/maxent/quasinewton/QNMinimizerTest.java | 7 +- .../ml/model/OnePassRealValueDataIndexerTest.java | 6 +- .../opennlp/tools/namefind/NameSampleTest.java | 2 +- .../opennlp/tools/parser/ParserEvaluatorTest.java | 6 +- .../java/opennlp/tools/postag/POSSampleTest.java | 10 +- .../sentdetect/SentenceDetectorEvaluatorTest.java | 2 +- .../tools/sentdetect/SentenceDetectorMEIT.java | 52 ++-- .../tools/sentdetect/SentenceDetectorMETest.java | 60 ++-- .../tools/sentdetect/SentenceSampleTest.java | 2 +- .../opennlp/tools/stemmer/SnowballStemmerTest.java | 112 +++---- .../opennlp/tools/tokenize/TokenSampleTest.java | 2 +- .../tools/util/AbstractDownloadUtilTest.java | 79 ----- .../tools/util/DownloadUtilDownloadTwiceTest.java | 30 +- .../java/opennlp/tools/util/DownloadUtilTest.java | 30 +- .../opennlp/tools/util/TrainingParametersTest.java | 8 +- .../util/featuregen/GeneratorFactoryTest.java | 4 +- opennlp-tools/src/test/resources/logback-test.xml | 6 +- .../20newsgroup/sci.electronics/52794.sample | 59 ++++ .../opennlp/tools/formats/{ => ad}/ad.sample | 0 .../opennlp/tools/formats/bionlp2004-01.sample | 33 +++ .../opennlp/tools/formats/brat/brat-ann.conf | 7 + .../opennlp/tools/formats/chunker-01.sample | 16 + ...lita-ner-it.sample => evalita-ner-it-01.sample} | 0 .../opennlp/tools/formats/evalita-ner-it-02.sample | 29 ++ .../opennlp/tools/formats/evalita-ner-it-03.sample | 22 ++ .../tools/formats/evalita-ner-it-broken.sample | 2 + .../tools/formats/evalita-ner-it-incorrect.sample | 3 + .../opennlp/tools/formats/lang-detect-01.sample | 1 + .../opennlp/tools/formats/lemma-01.sample | 1 + .../opennlp/tools/formats/moses/moses-tiny.sample | 3 + .../opennlp/tools/formats/muc/LDC2003T13.sgm | 73 +++++ .../opennlp/tools/formats/name-data-01.sample | 1 + .../formats/ontonotes/ontonotes-sample-01.name | 9 + .../formats/ontonotes/ontonotes-sample-02.parse | 29 ++ .../opennlp/tools/formats/parse-01.sample | 1 + .../opennlp/tools/formats/sentences-01.sample | 2 + .../opennlp/tools/formats/tokens-01.sample | 1 + .../opennlp/tools/formats/word-tags-01.sample | 1 + opennlp-uima/pom.xml | 36 ++- .../main/java/opennlp/uima/chunker/Chunker.java | 7 +- .../uima/doccat/AbstractDocumentCategorizer.java | 5 +- .../opennlp/uima/doccat/DocumentCategorizer.java | 4 +- .../opennlp/uima/namefind/AbstractNameFinder.java | 14 +- .../uima/namefind/DictionaryNameFinder.java | 19 +- .../java/opennlp/uima/namefind/NameFinder.java | 9 +- .../java/opennlp/uima/normalizer/Normalizer.java | 13 +- .../opennlp/uima/normalizer/StringDictionary.java | 1 + .../src/main/java/opennlp/uima/parser/Parser.java | 1 + .../uima/sentdetect/AbstractSentenceDetector.java | 4 +- .../opennlp/uima/sentdetect/SentenceDetector.java | 6 +- .../uima/sentdetect/SentenceModelResourceImpl.java | 1 + .../opennlp/uima/tokenize/AbstractTokenizer.java | 8 +- .../opennlp/uima/tokenize/SimpleTokenizer.java | 5 + .../main/java/opennlp/uima/tokenize/Tokenizer.java | 3 + .../opennlp/uima/tokenize/WhitespaceTokenizer.java | 6 + .../opennlp/uima/util/AbstractModelResource.java | 6 +- .../opennlp/uima/util/AnnotationComparator.java | 4 +- .../main/java/opennlp/uima/util/AnnotatorUtil.java | 62 ++-- .../opennlp/uima/util/ContainingConstraint.java | 2 + .../main/java/opennlp/uima/util/OpennlpUtil.java | 30 +- .../src/main/java/opennlp/uima/util/UimaUtil.java | 18 +- .../src/test/java/opennlp/uima/AbstractIT.java | 237 +++++++++++++++ .../src/test/java/opennlp/uima/AbstractTest.java | 49 ++++ .../test/java/opennlp/uima/AbstractUimaTest.java | 77 +++++ .../opennlp/uima/AnnotatorsInitializationTest.java | 66 ----- .../java/opennlp/uima/FullAnnotatorsFlowIT.java | 68 +++++ .../test/java/opennlp/uima/SingleAnnotatorIT.java | 85 ++++++ .../uima/dictionary/DictionaryResourceTest.java | 7 +- .../opennlp/uima/normalizer/NumberUtilTest.java | 5 +- .../uima/normalizer/StringDictionaryTest.java | 78 +++++ .../uima/util/AnnotationComboIteratorTest.java | 4 +- .../uima/util/AnnotationComparatorTest.java | 118 ++++++++ .../java/opennlp/uima/util/AnnotatorUtilTest.java | 235 +++++++++++++++ .../src/test/java/opennlp/uima/util/CasUtil.java | 2 +- .../java/opennlp/uima/util/OpennlpUtilTest.java | 144 +++++++++ .../test/java/opennlp/uima/util/UimaUtilTest.java | 117 ++++++++ .../src/test/resources/simplelogger.properties | 19 ++ .../test/resources/test-descriptors/Chunker.xml | 5 +- .../resources/test-descriptors/DateNameFinder.xml | 4 +- .../test-descriptors/DictionaryNameFinder.xml | 4 +- .../test-descriptors/LocationNameFinder.xml | 4 +- .../resources/test-descriptors/MoneyNameFinder.xml | 4 +- .../test-descriptors}/OpenNlpTextAnalyzer.xml | 15 +- .../test-descriptors/OrganizationNameFinder.xml | 4 +- .../test/resources/test-descriptors}/Parser.xml | 2 +- .../test-descriptors/PercentageNameFinder.xml | 4 +- .../test-descriptors/PersonNameFinder.xml | 18 +- .../test/resources/test-descriptors/PosTagger.xml | 18 +- .../test-descriptors/SentenceDetector.xml | 17 +- .../test-descriptors}/SimpleTokenizer.xml | 25 +- .../resources/test-descriptors/TimeNameFinder.xml | 4 +- .../test/resources/test-descriptors/Tokenizer.xml | 15 +- .../test/resources/test-descriptors/TypeSystem.xml | 45 ++- .../test-descriptors/WhitespaceTokenizer.xml} | 27 +- .../test/resources/training-params-invalid.conf | 22 ++ .../src/test/resources/training-params-test.conf | 22 ++ pom.xml | 178 +++++++++-- rat-excludes | 1 + src/license/NOTICE.template | 42 +++ src/license/THIRD-PARTY.properties | 16 + 384 files changed, 10602 insertions(+), 3013 deletions(-) create mode 100644 .github/workflows/license.yml create mode 100644 .github/workflows/shell-tests.yml create mode 100644 dev/Snowball-Stemmer.md create mode 100644 opennlp-distr/src/test/ps/test_opennlp.Tests.ps1 create mode 100644 opennlp-distr/src/test/sh/test_opennlp.bats copy opennlp-tools/src/main/java/opennlp/tools/{namefind/ThreadSafeNameFinderME.java => chunker/ThreadSafeChunkerME.java} (55%) copy opennlp-tools/src/main/java/opennlp/tools/formats/masc/{package-info.java => Masc.java} (84%) copy opennlp-tools/src/main/java/opennlp/tools/{lemmatizer/ThreadSafeLemmatizerME.java => langdetect/ThreadSafeLanguageDetectorME.java} (56%) create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/AbstractSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/BioNLP2004NameSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ChunkerSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/Conll02NameSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/Conll03NameSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ConllXPOSSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ConllXSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ConllXTokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/EvalitaNameSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/LanguageDetectorSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/LemmatizerSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/NameSampleDataStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ParseSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/SentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/TokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/TwentyNewsgroupSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/WordTagSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ad/ADChunkSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ad/ADPOSSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ad/ADSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ad/ADTokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/brat/BratNameSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/conllu/ConlluLemmaSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/conllu/ConlluPOSSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/conllu/ConlluSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/conllu/ConlluTokenSampleStreamFactoryTest.java rename opennlp-tools/src/test/java/opennlp/tools/{convert/FileToStringSampleStreamTest.java => formats/convert/AbstractConvertTest.java} (64%) copy opennlp-tools/src/test/java/opennlp/tools/formats/{muc/SgmlParserTest.java => convert/FileToByteArraySampleStreamTest.java} (63%) copy opennlp-tools/src/test/java/opennlp/tools/formats/{muc/SgmlParserTest.java => convert/FileToStringSampleStreamTest.java} (64%) create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/convert/NameToSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/convert/NameToTokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/convert/POSToSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/convert/POSToTokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/convert/ParseToPOSSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/convert/ParseToSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/convert/ParseToTokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/frenchtreebank/ConstitParseSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/irishsentencebank/IrishSentenceBankSentenceStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/irishsentencebank/IrishSentenceBankTokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/leipzig/LeipzigLanguageSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/letsmt/LetsmtSentenceStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/masc/MascNamedEntitySampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/masc/MascPOSSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/masc/MascSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/masc/MascTokenSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/moses/MosesSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/muc/Muc6NameSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/nkjp/NKJPSentenceSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ontonotes/OntoNotesNameSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ontonotes/OntoNotesPOSSampleStreamFactoryTest.java create mode 100644 opennlp-tools/src/test/java/opennlp/tools/formats/ontonotes/OntoNotesParseSampleStreamFactoryTest.java delete mode 100644 opennlp-tools/src/test/java/opennlp/tools/util/AbstractDownloadUtilTest.java create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/20newsgroup/sci.electronics/52794.sample rename opennlp-tools/src/test/resources/opennlp/tools/formats/{ => ad}/ad.sample (100%) create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/bionlp2004-01.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/brat/brat-ann.conf create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/chunker-01.sample rename opennlp-tools/src/test/resources/opennlp/tools/formats/{evalita-ner-it.sample => evalita-ner-it-01.sample} (100%) create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/evalita-ner-it-02.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/evalita-ner-it-03.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/evalita-ner-it-broken.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/evalita-ner-it-incorrect.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/lang-detect-01.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/lemma-01.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/moses/moses-tiny.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/muc/LDC2003T13.sgm create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/name-data-01.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/ontonotes/ontonotes-sample-01.name create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/ontonotes/ontonotes-sample-02.parse create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/parse-01.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/sentences-01.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/tokens-01.sample create mode 100644 opennlp-tools/src/test/resources/opennlp/tools/formats/word-tags-01.sample create mode 100644 opennlp-uima/src/test/java/opennlp/uima/AbstractIT.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/AbstractTest.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/AbstractUimaTest.java delete mode 100644 opennlp-uima/src/test/java/opennlp/uima/AnnotatorsInitializationTest.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/FullAnnotatorsFlowIT.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/SingleAnnotatorIT.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/normalizer/StringDictionaryTest.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/util/AnnotationComparatorTest.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/util/AnnotatorUtilTest.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/util/OpennlpUtilTest.java create mode 100644 opennlp-uima/src/test/java/opennlp/uima/util/UimaUtilTest.java create mode 100644 opennlp-uima/src/test/resources/simplelogger.properties copy opennlp-uima/{descriptors => src/test/resources/test-descriptors}/OpenNlpTextAnalyzer.xml (97%) copy opennlp-uima/{descriptors => src/test/resources/test-descriptors}/Parser.xml (99%) copy opennlp-uima/{descriptors => src/test/resources/test-descriptors}/SimpleTokenizer.xml (89%) copy opennlp-uima/{descriptors/SimpleTokenizer.xml => src/test/resources/test-descriptors/WhitespaceTokenizer.xml} (85%) create mode 100644 opennlp-uima/src/test/resources/training-params-invalid.conf create mode 100644 opennlp-uima/src/test/resources/training-params-test.conf create mode 100644 src/license/NOTICE.template create mode 100644 src/license/THIRD-PARTY.properties