[ https://issues.apache.org/jira/browse/OPENNLP-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905050#comment-17905050 ]
ASF GitHub Bot commented on OPENNLP-1666: ----------------------------------------- mawiesne opened a new pull request, #195: URL: https://github.com/apache/opennlp-sandbox/pull/195 - switches to ud-models in opennlp-similarity component and uses thread-safe Tokenizer, POSTagger and SentenceDetector impl classes to avoid race conditions, as shown by JUnit tests sometimes - switches 'tika-app' to more lightweight 'tika-core' dep - switches 'docx4j' to more lightweight / modern 'docx4j-core' dep (11.5.1, jakarta) - adapts README.md Thank you for contributing to Apache OpenNLP. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically main)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [x] Have you ensured that the full suite of tests is executed via `mvn clean install` at the root opennlp-sandbox folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](https://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp-sandbox folder? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp-sandbox folder? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible. > Switch to pre-trained UD models in Similarity component > ------------------------------------------------------- > > Key: OPENNLP-1666 > URL: https://issues.apache.org/jira/browse/OPENNLP-1666 > Project: OpenNLP > Issue Type: Task > Components: Similarity > Affects Versions: 2.5.1 > Reporter: Martin Wiesner > Assignee: Martin Wiesner > Priority: Major > Fix For: 2.5.2 > > > Atm, the opennlp-similarity sandbox component uses old (v1.5) models for > testing, contained as binary artifacts in the test resources directory. > Aims: > Get rid of this dependency on old model files > Switch to new pre-trained UD models (via OPENNLP_DOWNLOAD_HOME), maven > artifacts can be added in a separate issue > Make the existing tests pass with the UD based models > Modernize and tidy up some existing code structures in terms of API and > efficiency -- This message was sent by Atlassian Jira (v8.20.10#820010)