modern 'docx4j-core' dep (11.5.1, jakarta) switches to ud-models in opennlp-similarity component uses thread-safe Tokenizer, POSTagger and SentenceDetector impl classes to avoid race conditions, as shown by JUnit tests sometimes adapts README.md

mawiesne Wed, 11 Dec 2024 01:03:34 -0800

This is an automated email from the ASF dual-hosted git repository.

mawiesne pushed a commit to branch 
experimental/cleanup-dependency-mess-of-opennlp-similarity
in repository https://gitbox.apache.org/repos/asf/opennlp-sandbox.git


commit 93bc8650078fb20111f95563c2e6b5269b26bd9b
Author: Martin Wiesner <[email protected]>
AuthorDate: Tue Dec 10 11:26:20 2024 +0100

    reorganizes dependencies of 'opennlp-similarity' component
    switches 'tika-app' to more lightweight 'tika-core' dep
    switches 'docx4j' to more lightweight / modern 'docx4j-core' dep (11.5.1, 
jakarta)
    switches to ud-models in opennlp-similarity component
    uses thread-safe Tokenizer, POSTagger and SentenceDetector impl classes to 
avoid race conditions, as shown by JUnit tests sometimes
    adapts README.md
---
 opennlp-similarity/README.md                       | 134 +++++++-------
 opennlp-similarity/pom.xml                         |  84 +++++----
 .../review_builder/FBOpenGraphSearchManager.java   | 148 ---------------
 .../review_builder/WebPageReviewExtractor.java     |   2 -
 .../tools/apps/utils/email/EmailSender.java        |  26 +--
 .../tools/apps/utils/email/SMTPAuthenticator.java  |   4 +-
 ...cClassifierTrainingSetMultilingualExtender.java |   6 +-
 .../DocClassifierTrainingSetVerifier.java          |   4 +-
 .../enron_email_recognizer/EmailNormalizer.java    |  13 +-
 .../EmailTrainingSetFormer.java                    |   9 +-
 .../main/java/opennlp/tools/nl2code/NL2Obj.java    |  13 +-
 .../similarity/apps/ContentGeneratorRunner.java    |  21 +--
 .../tools/similarity/apps/solr/CommentsRel.java    |   2 +-
 .../apps/solr/ContentGeneratorRequestHandler.java  |  51 +-----
 .../solr/SearchResultsReRankerRequestHandler.java  |  26 +--
 .../apps/solr/WordDocBuilderEndNotes.java          |  45 ++---
 .../ParserChunker2MatcherProcessor.java            | 201 ++++++++-------------
 .../ParserPure2MatcherProcessor.java               |  60 +++---
 .../src/test/resources/models/en-sent.bin          | Bin 98533 -> 0 bytes
 pom.xml                                            |  18 +-
 20 files changed, 308 insertions(+), 559 deletions(-)

diff --git a/opennlp-similarity/README.md b/opennlp-similarity/README.md
index 7153beb..d296e46 100644
--- a/opennlp-similarity/README.md
+++ b/opennlp-similarity/README.md
@@ -6,51 +6,49 @@ It is leveraged in search, content generation & enrichment, 
chatbots and other t
 ## What is OpenNLP.Similarity?
 
 OpenNLP.Similarity is an NLP engine which solves a number of text processing 
and search tasks based on OpenNLP and Stanford NLP parsers. It is designed to 
be used by a non-linguist software engineer to build linguistically-enabled: 
-<ul>
-<li>search engines</li>
-<li>recommendation systems</li>
-<li>dialogue systems</li>
-<li>text analysis and semantic processing engines</li>
-<li>data-loss prevention system</li>
-<li>content & document generation tools</li>
-<li>text writing style, authenticity, sentiment, sensitivity to sharing 
recognizers</li>
-<li>general-purpose deterministic inductive learner equipped with abductive, 
deductive and analogical reasoning which also embraces concept learning and 
tree kernel learning. </li>
-</ul>
+
+- search engines
+- recommendation systems
+- dialogue systems
+- text analysis and semantic processing engines
+- data-loss prevention system
+- content & document generation tools
+- text writing style, authenticity, sentiment, sensitivity to sharing 
recognizers
+- general-purpose deterministic inductive learner equipped with abductive, 
deductive and analogical reasoning which also embraces concept learning and 
tree kernel learning. 
 
 OpenNLP similarity provides a series of techniques to support the overall 
content pipeline, from text collection to cleaning, classification, 
personalization and distribution. Technology and implementation of content 
pipeline developed at eBay is described 
[here](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/examples/ContentPipeline.pdf).
 
 ## Installation
- 0) Do [`git 
clone`](https://github.com/bgalitsky/relevance-based-on-parse-trees.git) to set 
up the environment including resources. Besides what you get from git, 
`/resources` directory requires some additional work:
- 
- 1) Download the main 
[jar](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/master/opennlp-similarity.11.jar).
- 
- 2) Set all necessary jars in /lib folder. Larger size jars are not on git so 
please download them from [Stanford NLP site](http://nlp.stanford.edu/)
- <li>edu.mit.jverbnet-1.2.0.jar</li>
- <li>ejml-0.23.jar</li>
- <li>joda-time.jar</li>
- <li>jollyday.jar</li>
- <li>stanford-corenlp-3.5.2-models.jar</li>
- <li>xom.jar</li>
+0. Do [`git 
clone`](https://github.com/bgalitsky/relevance-based-on-parse-trees.git) to set 
up the environment including resources. Besides what you get from git, 
`/resources` directory requires some additional work:
+
+1. Download the main 
[jar](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/master/opennlp-similarity.11.jar).
+
+2. Set all necessary jars in /lib folder. Larger size jars are not on git so 
please download them from [Stanford NLP site](http://nlp.stanford.edu/)
+  - edu.mit.jverbnet-1.2.0.jar
+  - ejml-0.23.jar
+  - joda-time.jar
+  - jollyday.jar
+  - stanford-corenlp-3.5.2-models.jar
+  - xom.jar
  The rest of jars are available via maven.
- 
- 3) Set up src/test/resources directory
- - new_vn.zip needs to be unzipped
- - OpenNLP models need to be downloaded into the directory 'models' from 
[here](http://opennlp.sourceforge.net/models-1.5/)
+
+3. Set up src/test/resources directory
+  - new_vn.zip needs to be unzipped
   
   As a result the following folders should be in /resources:
   As obtained [from 
git](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/test/resources):
- <li>/new_vn (VerbNet)</li>
- <li>/maps (some lookup files such as products, brands, first names etc.)</li>
- <li>/external_rst (examples of import of rhetoric parses from other 
systems)</li>
- <li>/fca (Formal Concept Analysis learning)</li>
- <li>/taxonomies (for search support, taxonomies are auto-mined from the 
web)</li>
- <li>/tree_kernel (for tree kernel learning, representation of parse trees, 
thickets and trained models)</li>
+  - /new_vn (VerbNet)
+  - /maps (some lookup files such as products, brands, first names etc.)
+  - /external_rst (examples of import of rhetoric parses from other systems)
+  - /fca (Formal Concept Analysis learning)
+  - /taxonomies (for search support, taxonomies are auto-mined from the web)
+  - /tree_kernel (for tree kernel learning, representation of parse trees, 
thickets and trained models)
   Manual downloading is also required for:
-  <li>/new_vn</li>
-  <li>/w2v (where word2vector model needs to be downloaded, if desired)</li>
-  
- 4) Try running tests which will give you a hint on how to integrate 
OpenNLP.Similarity functionality into your application. You can start with 
[Matcher 
test](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/949bac8c2a41c21a1e54fec075f2966d693114a4/src/test/java/opennlp/tools/parse_thicket/matching/PTMatcherTest.java)
 and observe how long paragraphs can be linguistically matched (you can compare 
this with just an intersection of keywords)
+  - /new_vn
+  - /w2v (where word2vector model needs to be downloaded, if desired)
   
- 5) Look at [example 
POMs](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/examples)
 for how to better integrate into your existing project
+4. Try running tests which will give you a hint on how to integrate 
OpenNLP.Similarity functionality into your application. You can start with 
[Matcher 
test](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/949bac8c2a41c21a1e54fec075f2966d693114a4/src/test/java/opennlp/tools/parse_thicket/matching/PTMatcherTest.java)
 and observe how long paragraphs can be linguistically matched (you can compare 
this with just an intersection of keywords)
+
+5. Look at [example 
POMs](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/examples)
 for how to better integrate into your existing project
   
 ## Creating a simple project
 
@@ -72,55 +70,54 @@ To avoid reparsing the same strings and improve the speed, 
use
 
 It operates on the level of sentences (giving [maximal common 
subtree](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/examples/Inferring_sem_prop_of_sentences.pdf))
 and paragraphs (giving maximal common [sub-parse 
thicket](https://en.wikipedia.org/wiki/Parse_Thicket)). Maximal common 
sub-parse thicket is also represented as a [list of common 
phrases](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/examples/MachineLearningSyntParseTreesGali
 [...]
 
-<li>Search results re-ranker based on linguistic similarity</li>
-<li>Request Handler for SOLR which used parse tree similarity</li>
+- Search results re-ranker based on linguistic similarity
+- Request Handler for SOLR which used parse tree similarity
 
 ### Search engine
 The following set of functionalities is available to enable search with 
linguistic features. It is desirable when query is long (more than 4 keywords), 
logically complex, ambiguous or 
-<li>Search results re-ranker based on linguistic similarity</li>
-<li>Request Handler for SOLR which used parse tree similarity</li>
-<li>Taxonomy builder via learning from the web</li>
-<li>Appropriate rhetoric map of an answer verifier. If parts of the answer are 
located in distinct discourse units, this answer might be irrelevant even if 
all keywords are mapped</li>
-<li>Tree kernel learning re-ranker to improve search relevance within a given 
domain with pre-trained model</li>
+- Search results re-ranker based on linguistic similarity
+- Request Handler for SOLR which used parse tree similarity
+- Taxonomy builder via learning from the web
+- Appropriate rhetoric map of an answer verifier. If parts of the answer are 
located in distinct discourse units, this answer might be irrelevant even if 
all keywords are mapped
+- Tree kernel learning re-ranker to improve search relevance within a given 
domain with pre-trained model
 
 SOLR request handlers are available 
[here](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/similarity/apps/solr)
 
 Taxonomy builder is 
[here](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/similarity/apps/taxo_builder).
- Examples of pre-built taxonomy are available in [this 
directory](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/test/resources/taxonomies).
 Please pay attention at taxonomies built for languages other than English. A 
[music 
taxonomy](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/master/src/test/resources/taxonomies/musicTaxonomyRoot.csv)
 is an example of the seed data for taxonomy building, and [this taxonomy 
hashmap dump](https://github.c [...]
+Examples of pre-built taxonomy are available in [this 
directory](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/test/resources/taxonomies).
 Please pay attention at taxonomies built for languages other than English. A 
[music 
taxonomy](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/master/src/test/resources/taxonomies/musicTaxonomyRoot.csv)
 is an example of the seed data for taxonomy building, and [this taxonomy 
hashmap dump](https://github.co [...]
  
 #### Search results re-ranker
-Re-ranking scores similarity between a given `orderedListOfAnswers` and  
`question`
-
-  `List<Pair<String,Double>> pairList = new ArrayList<Pair<String,Double>>();`
-  
-  `for (String ans: orderedListOfAnswers) {`
+Re-ranking scores similarity between a given `orderedListOfAnswers` and  
`question`:
+ 
+```
+  List<Pair<String,Double>> pairList = new ArrayList<Pair<String,Double>>();
   
-            `List<List<ParseTreeChunk>> similarityResult = 
m.assessRelevanceCache(question, ans);`
-            
-            `double score = 
parseTreeChunkListScorer.getParseTreeChunkListScoreAggregPhraseType(similarityResult);`
-            
-            `Pair<String,Double> p = new Pair<String, Double>(ans, score);`
-            
-            `pairList.add(p);`
-            
-        `}`
+  for (String ans: orderedListOfAnswers) {
+
+    List<List<ParseTreeChunk>> similarityResult = 
m.assessRelevanceCache(question, ans);
+    double score = 
parseTreeChunkListScorer.getParseTreeChunkListScoreAggregPhraseType(similarityResult);
+    Pair<String,Double> p = new Pair<String, Double>(ans, score);
+    pairList.add(p);
+  }
         
-   `Collections.sort(pairList, Comparator.comparing(p -> p.getSecond()));`
+  Collections.sort(pairList, Comparator.comparing(p -> p.getSecond()));
+```
    
    Then `pairList` is then ranked according to the linguistic relevance score. 
This score can be combined with other sources such as popularity, geo-proximity 
and others.
 
 ### Content generator
- It takes a topic, builds a taxonomy for it and forms a table of content. It 
then  mines the web for documents for each table of content item, finds 
relevant sentences and paragraphs and merges them into a document 
[package](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/similarity/apps).
 The resultant document has a TOC, sections, figures & captions and also a 
reference section. We attempt to reproduce how humans cut-and-paste content 
[...]
-  Content generation has a [demo](http://37.46.135.20/)  and to run it from 
IDE start 
[here](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/master/src/main/java/opennlp/tools/similarity/apps/ContentGeneratorRunner.java).
 Examples of written documents are [here](http://37.46.135.20/wrt_latest/).
-  Another content generation option is about opinion data. Reviews are mined 
for, cross-bred and made "original" for search engines. This and general 
content generation is done for SEO purposes. [Review 
builder](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/apps/review_builder/ReviewBuilderRunner.java)
 composes fake reviews which are in turn should be recognized by a Fake Review 
detector
+It takes a topic, builds a taxonomy for it and forms a table of content. It 
then  mines the web for documents for each table of content item, finds 
relevant sentences and paragraphs and merges them into a document 
[package](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/similarity/apps).
 The resultant document has a TOC, sections, figures & captions and also a 
reference section. We attempt to reproduce how humans cut-and-paste content  
[...]
+Content generation has a [demo](http://37.46.135.20/)  and to run it from IDE 
start 
[here](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/master/src/main/java/opennlp/tools/similarity/apps/ContentGeneratorRunner.java).
 Examples of written documents are [here](http://37.46.135.20/wrt_latest/).
+
+Another content generation option is about opinion data. Reviews are mined 
for, cross-bred and made "original" for search engines. This and general 
content generation is done for SEO purposes. [Review 
builder](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/apps/review_builder/ReviewBuilderRunner.java)
 composes fake reviews which are in turn should be recognized by a Fake Review 
detector
 
 ### Text classifier / feature detector in text
 The [classifier 
code](https://github.com/bgalitsky/relevance-based-on-parse-trees/blob/master/src/main/java/opennlp/tools/parse_thicket/kernel_interface/TreeKernelBasedClassifierMultiplePara.java)
 is the same but the [model 
files](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/test/resources/tree_kernel/TRAINING)
 vary for the applications below:
-<li>detect security leaks
-<li>detect argumentation
-<li>detect low cohesiveness in text
-<li>detect authors’ doubt and low confidence
-<li>detect fake review
+- detect security leaks
+- detect argumentation
+- detect low cohesiveness in text
+- detect authors’ doubt and low confidence
+- detect fake review
 
 Document classification to six major classes {finance, business, legal, 
computing, engineering, health} is available via [nearest neighbor 
model](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/doc_classifier/DocClassifier.java).
 A Lucene training model (1G file) is obtained from Wikipedia corpus. This 
classifier can be trained for an arbitrary classes once respective Wiki pages 
are selected and respective [Lucene index is built](https: [...]
 
@@ -135,8 +132,7 @@ Document classification to six major classes {finance, 
business, legal, computin
  To do model building and predictions, C modules are run in [this 
directory](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/test/resources/tree_kernel),
 so proper choice need to be made: {svm_classify.linux, svm_classify.max, 
svm_classify.exe, svm_learn.*}. Also, proper run permissions needs to be set 
for these files.
  
 #### Concept learning 
- 
-  is a branch of deterministic learning which is applied to attribute-value 
pairs and possesses useful explainability feature, unlike statistical and deep 
learning. It is fairly useful for data exploration and visualization since all 
interesting relations can be visualized. 
+.. is a branch of deterministic learning which is applied to attribute-value 
pairs and possesses useful explainability feature, unlike statistical and deep 
learning. It is fairly useful for data exploration and visualization since all 
interesting relations can be visualized. 
     Concept learning covers inductive and abductive learning and also some 
cases of deduction. Explore [this 
package](https://github.com/bgalitsky/relevance-based-on-parse-trees/tree/master/src/main/java/opennlp/tools/fca)
 for the concept learning-related features.
 
 ### Filtering results for Speech Recognition based on semantic meaningfulness
diff --git a/opennlp-similarity/pom.xml b/opennlp-similarity/pom.xml
index 58dd8a2..b10aa48 100644
--- a/opennlp-similarity/pom.xml
+++ b/opennlp-similarity/pom.xml
@@ -27,6 +27,12 @@
   <name>Apache OpenNLP Similarity distribution</name>
 
   <properties>
+    <jakarta.bind-api.version>4.0.2</jakarta.bind-api.version>
+    <jakarta.mail.version>2.1.3</jakarta.mail.version>
+
+    <tika.version>3.0.0</tika.version>
+    <solr.version>8.11.3</solr.version>
+    <docx4j.version>11.5.1</docx4j.version>
     <dl4j.version>1.0.0-M2.1</dl4j.version>
     <hdf5.version>1.14.3-1.5.10</hdf5.version>
     <javacpp.version>1.5.11</javacpp.version>
@@ -83,27 +89,24 @@
       <groupId>org.apache.opennlp</groupId>
       <artifactId>opennlp-tools</artifactId>
     </dependency>
-
     <dependency>
-      <groupId>org.slf4j</groupId>
-      <artifactId>slf4j-api</artifactId>
+      <groupId>org.apache.commons</groupId>
+      <artifactId>commons-math3</artifactId>
     </dependency>
-
     <dependency>
-      <groupId>commons-lang</groupId>
-      <artifactId>commons-lang</artifactId>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+      <scope>runtime</scope>
     </dependency>
     <dependency>
-      <groupId>commons-codec</groupId>
-      <artifactId>commons-codec</artifactId>
+      <groupId>jakarta.xml.bind</groupId>
+      <artifactId>jakarta.xml.bind-api</artifactId>
+      <version>${jakarta.bind-api.version}</version>
     </dependency>
     <dependency>
-      <groupId>commons-collections</groupId>
-      <artifactId>commons-collections</artifactId>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.commons</groupId>
-      <artifactId>commons-math3</artifactId>
+      <groupId>jakarta.mail</groupId>
+      <artifactId>jakarta.mail-api</artifactId>
+      <version>${jakarta.mail.version}</version>
     </dependency>
     <dependency>
       <groupId>org.json</groupId>
@@ -112,19 +115,20 @@
     </dependency>
     <dependency>
       <groupId>org.apache.tika</groupId>
-      <artifactId>tika-app</artifactId>
-      <version>3.0.0</version>
+      <artifactId>tika-core</artifactId>
+      <version>${tika.version}</version>
     </dependency>
     <dependency>
-      <groupId>net.sf.opencsv</groupId>
-      <artifactId>opencsv</artifactId>
-      <version>2.3</version>
+      <groupId>org.apache.tika</groupId>
+      <artifactId>tika-parser-html-module</artifactId>
+      <version>${tika.version}</version>
+      <scope>runtime</scope>
     </dependency>
 
     <dependency>
       <groupId>org.apache.solr</groupId>
       <artifactId>solr-core</artifactId>
-      <version>8.11.3</version>
+      <version>${solr.version}</version>
       <exclusions>
         <exclusion>
           <groupId>org.apache.hadoop</groupId>
@@ -138,20 +142,13 @@
           <groupId>org.eclipse.jetty.http2</groupId>
           <artifactId>*</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.logging.log4j</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
 
-    <dependency>
-      <groupId>javax.mail</groupId>
-      <artifactId>mail</artifactId>
-      <version>1.4.7</version>
-    </dependency>
-    <dependency>
-      <groupId>com.restfb</groupId>
-      <artifactId>restfb</artifactId>
-      <version>1.49.0</version>
-    </dependency>
-
     <dependency>
       <groupId>net.billylieurance.azuresearch</groupId>
       <artifactId>azure-bing-search-java</artifactId>
@@ -181,8 +178,8 @@
 
     <dependency>
       <groupId>org.docx4j</groupId>
-      <artifactId>docx4j</artifactId>
-      <version>6.1.2</version>
+      <artifactId>docx4j-core</artifactId>
+      <version>${docx4j.version}</version>
       <exclusions>
         <!-- Exclusion here as log4j version 2 bindings are used during 
tests/runtime-->
         <exclusion>
@@ -217,11 +214,7 @@
         </exclusion>
       </exclusions>
     </dependency>
-    <dependency>
-      <groupId>org.deeplearning4j</groupId>
-      <artifactId>deeplearning4j-ui</artifactId>
-      <version>${dl4j.version}</version>
-    </dependency>
+
     <dependency>
       <groupId>org.deeplearning4j</groupId>
       <artifactId>deeplearning4j-nlp</artifactId>
@@ -252,10 +245,15 @@
       <groupId>org.junit.jupiter</groupId>
       <artifactId>junit-jupiter-params</artifactId>
     </dependency>
+
+    <!-- Logging -->
     <dependency>
-      <groupId>org.apache.logging.log4j</groupId>
-      <artifactId>log4j-api</artifactId>
-      <scope>test</scope>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-api</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>log4j-over-slf4j</artifactId>
     </dependency>
     <dependency>
       <groupId>org.apache.logging.log4j</groupId>
@@ -265,7 +263,7 @@
     <dependency>
       <groupId>org.apache.logging.log4j</groupId>
       <artifactId>log4j-slf4j2-impl</artifactId>
-      <scope>test</scope>
+      <scope>runtime</scope>
     </dependency>
   </dependencies>
 
@@ -444,7 +442,7 @@
         <configuration>
           <source>${maven.compiler.source}</source>
           <target>${maven.compiler.target}</target>
-          <compilerArgument>-Xlint</compilerArgument>
+          <compilerArgument>-Xlint:-options</compilerArgument>
         </configuration>
       </plugin>
 
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/review_builder/FBOpenGraphSearchManager.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/apps/review_builder/FBOpenGraphSearchManager.java
deleted file mode 100644
index f2a130a..0000000
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/review_builder/FBOpenGraphSearchManager.java
+++ /dev/null
@@ -1,148 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package opennlp.tools.apps.review_builder;
-
-import java.util.ArrayList;
-import java.util.List;
-
-import com.restfb.Connection;
-import com.restfb.DefaultFacebookClient;
-import com.restfb.FacebookClient;
-import com.restfb.Parameter;
-import com.restfb.exception.FacebookException;
-import com.restfb.types.Event;
-import com.restfb.types.Page;
-import org.apache.commons.lang.StringUtils;
-
-import opennlp.tools.jsmlearning.ProfileReaderWriter;
-import opennlp.tools.similarity.apps.utils.PageFetcher;
-
-public class FBOpenGraphSearchManager {
-
-       public final List<String[]> profiles;
-       protected FacebookClient mFBClient;
-       protected final PageFetcher pageFetcher = new PageFetcher();
-       protected static final int NUM_TRIES = 5;
-       protected static final long WAIT_BTW_TRIES=1000; //milliseconds between 
re-tries
-
-       public FBOpenGraphSearchManager(){
-               profiles = 
ProfileReaderWriter.readProfiles("C:\\nc\\features\\analytics\\dealanalyzer\\sweetjack-localcoupon-may12012tooct302012.csv");
-       }
-
-       public void setFacebookClient(FacebookClient c){
-               this.mFBClient=c;
-       }
-       
-       public List<Event> getFBEventsByName(String event)
-       {
-           List<Event> events = new ArrayList<>();
-           
-           for(int i=0; i < NUM_TRIES; i++)
-           {
-           try
-           {
-                   Connection<Event> publicSearch =
-                           mFBClient.fetchConnection("search", Event.class,
-                             Parameter.with("q", event), 
Parameter.with("type", "event"),Parameter.with("limit", 100));
-                   System.out.println("Searching FB events for " + event);
-                   events= publicSearch.getData();
-                   break;
-           }
-           catch(FacebookException e)
-           {
-               System.out.println("FBError "+e);
-               try
-                {
-                    Thread.sleep(WAIT_BTW_TRIES);
-                }
-                catch (InterruptedException e1)
-                {
-                       System.out.println("Error "+e1);
-                }
-           }
-           }
-           return events;
-       }
-       
-       public Long getFBPageLikes(String merchant)
-       {
-        List<Page> groups = new ArrayList<>();
-        
-        for(int i=0; i < NUM_TRIES; i++)
-        {
-            try
-            {
-                Connection<Page> publicSearch =
-                        mFBClient.fetchConnection("search", Page.class,
-                          Parameter.with("q", merchant), 
Parameter.with("type", "page"),Parameter.with("limit", 100));
-                System.out.println("Searching FB Pages for " + merchant);
-                groups= publicSearch.getData();
-                break;
-            }
-            catch(FacebookException e)
-            {
-               System.out.println("FBError "+e);
-                try
-                {
-                    Thread.sleep(WAIT_BTW_TRIES);
-                }
-                catch (InterruptedException e1)
-                {
-                       System.out.println("Error "+e1);
-                }
-            }
-        }
-        
-        for (Page p: groups){
-               if (p!=null && p.getLikes()!=null && p.getLikes()>0) 
-                       return p.getLikes();
-        }
-        
-        //stats fwb">235</span>
-        
-        for (Page p: groups){
-               if (p.getId()==null)
-                       continue;
-               String content = 
pageFetcher.fetchOrigHTML("http://www.facebook.com/"+p.getId());
-        
-               String likes = StringUtils.substringBetween(content, "stats 
fwb\">", "<" );
-               if (likes==null)
-                       continue;
-               int nLikes =0;
-               try {
-                       nLikes = Integer.parseInt(likes);
-               } catch (Exception e){
-                       
-               }
-               if (nLikes>0){
-                       return (long)nLikes;
-               }
-               
-        }
-        return null;
-       }
-    
-       public static void main(String[] args){
-               FBOpenGraphSearchManager man = new FBOpenGraphSearchManager ();
-               man.setFacebookClient(new DefaultFacebookClient());
-
-               long res = man.getFBPageLikes("chain saw");
-               System.out.println(res);
-
-       }
-}
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/review_builder/WebPageReviewExtractor.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/apps/review_builder/WebPageReviewExtractor.java
index 4448f58..14574f3 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/review_builder/WebPageReviewExtractor.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/apps/review_builder/WebPageReviewExtractor.java
@@ -28,7 +28,6 @@ import opennlp.tools.similarity.apps.HitBase;
 import opennlp.tools.similarity.apps.utils.StringDistanceMeasurer;
 import opennlp.tools.similarity.apps.utils.Utils;
 import opennlp.tools.textsimilarity.TextProcessor;
-import 
opennlp.tools.textsimilarity.chunker2matcher.ParserChunker2MatcherProcessor;
 
 import org.apache.commons.lang.StringUtils;
 import org.slf4j.Logger;
@@ -392,7 +391,6 @@ public class WebPageReviewExtractor extends 
WebPageExtractor {
 
        public static void main(String[] args){
                String resourceDir = "C:/stanford-corenlp/src/test/resources/";
-               ParserChunker2MatcherProcessor proc = 
ParserChunker2MatcherProcessor.getInstance(resourceDir); 
                        
                //ProductFinderInAWebPage init = new 
ProductFinderInAWebPage("C:/workspace/relevanceEngine/src/test/resources");
 
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/EmailSender.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/EmailSender.java
index c5388fa..94ba811 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/EmailSender.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/EmailSender.java
@@ -17,19 +17,19 @@
 
 package opennlp.tools.apps.utils.email;
 
-import javax.activation.DataHandler;
-import javax.activation.DataSource;
-import javax.activation.FileDataSource;
-import javax.mail.Authenticator;
-import javax.mail.BodyPart;
-import javax.mail.Message;
-import javax.mail.Multipart;
-import javax.mail.Session;
-import javax.mail.Transport;
-import javax.mail.internet.InternetAddress;
-import javax.mail.internet.MimeBodyPart;
-import javax.mail.internet.MimeMessage;
-import javax.mail.internet.MimeMultipart;
+import jakarta.activation.DataHandler;
+import jakarta.activation.DataSource;
+import jakarta.activation.FileDataSource;
+import jakarta.mail.Authenticator;
+import jakarta.mail.BodyPart;
+import jakarta.mail.Message;
+import jakarta.mail.Multipart;
+import jakarta.mail.Session;
+import jakarta.mail.Transport;
+import jakarta.mail.internet.InternetAddress;
+import jakarta.mail.internet.MimeBodyPart;
+import jakarta.mail.internet.MimeMessage;
+import jakarta.mail.internet.MimeMultipart;
 import java.util.Properties;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/SMTPAuthenticator.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/SMTPAuthenticator.java
index c48ab34..55f56dd 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/SMTPAuthenticator.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/apps/utils/email/SMTPAuthenticator.java
@@ -17,12 +17,12 @@
 
 package opennlp.tools.apps.utils.email;
 
-import javax.mail.PasswordAuthentication;
+import jakarta.mail.PasswordAuthentication;
 
 /**
  * This contains the required information for the smtp authorization!
  */
-public class SMTPAuthenticator extends javax.mail.Authenticator {
+public class SMTPAuthenticator extends jakarta.mail.Authenticator {
        
        private final String username;
        private final String password;
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetMultilingualExtender.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetMultilingualExtender.java
index 29a5107..18d778c 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetMultilingualExtender.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetMultilingualExtender.java
@@ -27,11 +27,11 @@ import java.net.URL;
 import java.nio.channels.Channels;
 import java.nio.channels.ReadableByteChannel;
 import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
 import java.util.ArrayList;
 import java.util.HashSet;
 import java.util.List;
 
-import org.apache.commons.io.FileUtils;
 import org.apache.commons.lang.StringUtils;
 
 /*
@@ -86,7 +86,7 @@ public class DocClassifierTrainingSetMultilingualExtender {
                List<String> filteredEntries = new ArrayList<>();
                String content=null;
                try {
-                       content = FileUtils.readFileToString(new 
File(filename), StandardCharsets.UTF_8);
+                       content = Files.readString(new File(filename).toPath(), 
StandardCharsets.UTF_8);
                } catch (IOException e) {
                        e.printStackTrace();
                }
@@ -127,7 +127,7 @@ public class DocClassifierTrainingSetMultilingualExtender {
                                        continue;
                                
                                System.out.println("processing "+f.getName());
-                               content = FileUtils.readFileToString(f, 
"utf-8");
+                               content = Files.readString(f.toPath(), 
StandardCharsets.UTF_8);
                                int langIndex =0;
                                for(String[] begEnd: MULTILINGUAL_TOKENS){
                                        String urlDirty = 
StringUtils.substringBetween(content, begEnd[0], begEnd[1]);
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetVerifier.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetVerifier.java
index 95c2b27..d774c4d 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetVerifier.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/doc_classifier/DocClassifierTrainingSetVerifier.java
@@ -18,12 +18,12 @@ package opennlp.tools.doc_classifier;
 
 import java.io.File;
 import java.io.IOException;
+import java.nio.file.Files;
 import java.util.ArrayList;
 import java.util.List;
 
 import opennlp.tools.jsmlearning.ProfileReaderWriter;
 
-import org.apache.commons.io.FileUtils;
 import org.apache.tika.Tika;
 import org.apache.tika.exception.TikaException;
 
@@ -96,7 +96,7 @@ public class DocClassifierTrainingSetVerifier {
                                                && resultsClassif.get(0).equals(
                                                                
ClassifierTrainingSetIndexer.getCategoryFromFilePath(f.getAbsolutePath()))){
                                        String destFileName = 
f.getAbsolutePath().replace(sourceDir, destinationDir);
-                                       FileUtils.copyFile(f, new 
File(destFileName));
+                                       Files.copy(f.toPath(), new 
File(destFileName).toPath());
                                        bRejected = false;
                                } else {
                                        System.out.println("File "+ 
f.getAbsolutePath() + "\n classified as "+
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailNormalizer.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailNormalizer.java
index 6e1ebe9..3fde124 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailNormalizer.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailNormalizer.java
@@ -20,10 +20,9 @@ package opennlp.tools.enron_email_recognizer;
 import java.io.File;
 import java.io.IOException;
 import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
 import java.util.ArrayList;
 
-import org.apache.commons.io.FileUtils;
-
 public class EmailNormalizer {
 
        protected final ArrayList<File> queue = new ArrayList<>();
@@ -67,7 +66,7 @@ public class EmailNormalizer {
        public void normalizeAndWriteIntoANewFile(File f){
                String content = "";
                try {
-                       content = FileUtils.readFileToString(f, 
StandardCharsets.UTF_8);
+                       content = Files.readString(f.toPath(), 
StandardCharsets.UTF_8);
                } catch (IOException e) {
                        e.printStackTrace();
                }
@@ -95,10 +94,10 @@ public class EmailNormalizer {
                String directoryNew = f.getAbsolutePath().replace(origFolder, 
newFolder);
                try {
                        String fullFileNameNew = directoryNew +"txt";
-               FileUtils.writeStringToFile(new File(fullFileNameNew), 
buf.toString(), StandardCharsets.UTF_8);
-        } catch (IOException e) {
-               e.printStackTrace();
-        }
+                       Files.writeString(new File(fullFileNameNew).toPath(), 
buf.toString(), StandardCharsets.UTF_8);
+               } catch (IOException e) {
+                       e.printStackTrace();
+               }
        }
        
        public void normalizeDirectory(File f){
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailTrainingSetFormer.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailTrainingSetFormer.java
index 1a8ce6d..2551052 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailTrainingSetFormer.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/enron_email_recognizer/EmailTrainingSetFormer.java
@@ -20,10 +20,9 @@ package opennlp.tools.enron_email_recognizer;
 import java.io.File;
 import java.io.IOException;
 import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
 import java.util.List;
 
-import org.apache.commons.io.FileUtils;
-
 public class EmailTrainingSetFormer {
        static final String DATA_DIR = "/Users/bgalitsky/Downloads/";
        static final String FILE_LIST_FILE = "cats4_11-17.txt";
@@ -32,14 +31,14 @@ public class EmailTrainingSetFormer {
        //enron_with_categories/5/70665.cats:4,10,1
        public static void  createPosTrainingSet(){
                try {
-                       List<String> lines = FileUtils.readLines(new 
File(DATA_DIR + FILE_LIST_FILE), StandardCharsets.UTF_8);
+                       List<String> lines = Files.readAllLines(new 
File(DATA_DIR + FILE_LIST_FILE).toPath(), StandardCharsets.UTF_8);
                        for(String l: lines){
                                int endOfFname = l.indexOf('.'), startOfFname = 
l.lastIndexOf('/');
                                String filenameOld = DATA_DIR + l.substring(0, 
endOfFname)+".txt";
                                String content = normalize(new 
File(filenameOld));
                                String filenameNew = DESTINATION_DIR + 
l.substring(startOfFname+1, endOfFname)+".txt";
                                //FileUtils.copyFile(new File(filenameOld), new 
File(filenameNew));
-                               FileUtils.writeStringToFile(new 
File(filenameNew), content, StandardCharsets.UTF_8);
+                               Files.writeString(new 
File(filenameNew).toPath(), content, StandardCharsets.UTF_8);
                        }
                } catch (Exception e) {
                        e.printStackTrace();
@@ -52,7 +51,7 @@ public class EmailTrainingSetFormer {
        public static String normalize(File f){
                String content="";
                try {
-                       content = FileUtils.readFileToString(f, 
StandardCharsets.UTF_8);
+                       content = Files.readString(f.toPath(), 
StandardCharsets.UTF_8);
                } catch (IOException e) {
                        e.printStackTrace();
                }
diff --git a/opennlp-similarity/src/main/java/opennlp/tools/nl2code/NL2Obj.java 
b/opennlp-similarity/src/main/java/opennlp/tools/nl2code/NL2Obj.java
index e4beac6..3d8929f 100644
--- a/opennlp-similarity/src/main/java/opennlp/tools/nl2code/NL2Obj.java
+++ b/opennlp-similarity/src/main/java/opennlp/tools/nl2code/NL2Obj.java
@@ -30,18 +30,15 @@ public class NL2Obj {
   ObjectControlOp prevOp;
 
   public NL2Obj(String path) {
+    this();
+  }
+  
+  public NL2Obj() {
     prevOp = new ObjectControlOp();
     prevOp.setOperatorIf("");
     prevOp.setOperatorFor("");
-    parser = ParserChunker2MatcherProcessor.getInstance(path);
+    parser = ParserChunker2MatcherProcessor.getInstance();
   }
-  
-  public NL2Obj() {
-           prevOp = new ObjectControlOp();
-           prevOp.setOperatorIf("");
-           prevOp.setOperatorFor("");
-           parser = ParserChunker2MatcherProcessor.getInstance();
-         }
 
   static final String[] EPISTEMIC_STATES_LIST = new String[] {
     "select", "verify", "find", "start", "stop", "go", "check"
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/ContentGeneratorRunner.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/ContentGeneratorRunner.java
index b6bc2b1..0bf2e59 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/ContentGeneratorRunner.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/ContentGeneratorRunner.java
@@ -18,26 +18,13 @@ package opennlp.tools.similarity.apps;
 
 import java.util.List;
 
-import javax.mail.internet.AddressException;
-import javax.mail.internet.InternetAddress;
-
-import 
opennlp.tools.textsimilarity.chunker2matcher.ParserChunker2MatcherProcessor;
+import jakarta.mail.internet.AddressException;
+import jakarta.mail.internet.InternetAddress;
 
 public class ContentGeneratorRunner {
+
        public static void main(String[] args) {
-               ParserChunker2MatcherProcessor sm = null;
-                   
-               try {
-                       String resourceDir = args[2];
-                       if (resourceDir!=null)
-                               sm = 
ParserChunker2MatcherProcessor.getInstance(resourceDir);
-                       else
-                               sm = 
ParserChunker2MatcherProcessor.getInstance();
-       
-               } catch (Exception e) {
-                       e.printStackTrace();
-               }
-           
+
                String bingKey = args[7];
                if (bingKey == null){
                        bingKey = 
"e8ADxIjn9YyHx36EihdjH/tMqJJItUrrbPTUpKahiU0=";
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/CommentsRel.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/CommentsRel.java
index e80e94e..85c4714 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/CommentsRel.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/CommentsRel.java
@@ -23,7 +23,7 @@ import java.io.File;
 import java.io.IOException;
 import java.math.BigInteger;
 
-import javax.xml.bind.JAXBException;
+import jakarta.xml.bind.JAXBException;
 
 import org.docx4j.XmlUtils;
 import org.docx4j.jaxb.Context;
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/ContentGeneratorRequestHandler.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/ContentGeneratorRequestHandler.java
index a40c0bb..5403ab5 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/ContentGeneratorRequestHandler.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/ContentGeneratorRequestHandler.java
@@ -16,31 +16,29 @@
  */
 package opennlp.tools.similarity.apps.solr;
 
-import java.io.BufferedReader;
 import java.io.File;
 import java.io.FileOutputStream;
 import java.io.IOException;
-import java.io.InputStream;
-import java.io.InputStreamReader;
+import java.lang.invoke.MethodHandles;
 import java.util.List;
-import java.util.logging.Logger;
 
-import javax.mail.internet.AddressException;
-import javax.mail.internet.InternetAddress;
+import jakarta.mail.internet.AddressException;
+import jakarta.mail.internet.InternetAddress;
 
 import org.apache.solr.common.util.NamedList;
 import org.apache.solr.handler.component.SearchHandler;
 import org.apache.solr.request.SolrQueryRequest;
 import org.apache.solr.response.SolrQueryResponse;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import opennlp.tools.similarity.apps.HitBase;
 import opennlp.tools.similarity.apps.RelatedSentenceFinder;
 import opennlp.tools.similarity.apps.RelatedSentenceFinderML;
-import 
opennlp.tools.textsimilarity.chunker2matcher.ParserChunker2MatcherProcessor;
 
 public class ContentGeneratorRequestHandler extends SearchHandler {
-       private static final Logger LOG =
-                                       
Logger.getLogger("com.become.search.requestHandlers.SearchResultsReRankerRequestHandler");
+       private static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
        private final WordDocBuilderEndNotes docBuilder = new 
WordDocBuilderEndNotes ();
 
        public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse 
rsp){
@@ -97,44 +95,13 @@ public class ContentGeneratorRequestHandler extends 
SearchHandler {
 
        }
 
-       static class StreamLogger extends Thread{
-
-               private final InputStream mInputStream;
-
-               public StreamLogger(InputStream is) {
-                       this.mInputStream = is;
-               }
-
-               public void run() {
-                       try {
-                               InputStreamReader isr = new 
InputStreamReader(mInputStream);
-                               BufferedReader br = new BufferedReader(isr);
-                               String line;
-                               while ((line = br.readLine()) != null) {
-                                       System.out.println(line);
-                               }
-                       } catch (IOException ioe) {
-                               ioe.printStackTrace();
-                       }
-               }
-       }
-
        public String cgRunner(String[] args) {
-               int count=0; 
+
+               int count=0;
                for(String a: args){
                        System.out.print(count+">>" + a + " | ");
                        count++;
                }
-               try {
-                       String resourceDir = args[2];
-                       ParserChunker2MatcherProcessor sm = null;
-                       if (resourceDir!=null)
-                               sm = 
ParserChunker2MatcherProcessor.getInstance(resourceDir);
-                       else
-                               sm = 
ParserChunker2MatcherProcessor.getInstance();
-               } catch (Exception e) {
-                       e.printStackTrace();
-               }
 
                String bingKey = args[7];
                if (bingKey == null){
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/SearchResultsReRankerRequestHandler.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/SearchResultsReRankerRequestHandler.java
index 3e77f43..c7345fc 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/SearchResultsReRankerRequestHandler.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/SearchResultsReRankerRequestHandler.java
@@ -16,11 +16,11 @@
  */
 package opennlp.tools.similarity.apps.solr;
 
+import java.lang.invoke.MethodHandles;
 import java.util.ArrayList;
 import java.util.Comparator;
 import java.util.Iterator;
 import java.util.List;
-import java.util.logging.Logger;
 
 import opennlp.tools.similarity.apps.HitBase;
 import opennlp.tools.textsimilarity.ParseTreeChunk;
@@ -34,16 +34,16 @@ import org.apache.solr.common.util.NamedList;
 import org.apache.solr.handler.component.SearchHandler;
 import org.apache.solr.request.SolrQueryRequest;
 import org.apache.solr.response.SolrQueryResponse;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class SearchResultsReRankerRequestHandler extends SearchHandler {
-       private static final Logger LOG =
-                                       
Logger.getLogger("com.become.search.requestHandlers.SearchResultsReRankerRequestHandler");
+
+       private static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
        private final static int MAX_SEARCH_RESULTS = 100;
        private final ParseTreeChunkListScorer parseTreeChunkListScorer = new 
ParseTreeChunkListScorer();
        private ParserChunker2MatcherProcessor sm = null;
-       private static final String RESOURCE_DIR = 
"/home/solr/solr-4.4.0/example/src/test/resources";
-       //"C:/workspace/TestSolr/src/test/resources";
-       //"/data1/solr/example/src/test/resources";
 
        public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse 
rsp){
                // get query string
@@ -66,10 +66,6 @@ public class SearchResultsReRankerRequestHandler extends 
SearchHandler {
 
                List<HitBase> searchResults = new ArrayList<>();
 
-
-
-
-
                for (int i = 0; i< MAX_SEARCH_RESULTS; i++){
                        String title = req.getParams().get("t"+i);
                        String descr = req.getParams().get("d"+i);
@@ -106,7 +102,6 @@ public class SearchResultsReRankerRequestHandler extends 
SearchHandler {
                        }
                }
 
-
                List<HitBase> reRankedResults;
                query = query.replace('+', ' ');
                if (tooFewKeywords(query)|| orQuery(query)){
@@ -165,12 +160,11 @@ public class SearchResultsReRankerRequestHandler extends 
SearchHandler {
                return false;
        }
 
-       private List<HitBase> calculateMatchScoreResortHits(List<HitBase> hits,
-                       String searchQuery) {
+       private List<HitBase> calculateMatchScoreResortHits(List<HitBase> hits, 
String searchQuery) {
                try {
-                       sm =  
ParserChunker2MatcherProcessor.getInstance(RESOURCE_DIR);
-               } catch (Exception e){
-                       LOG.severe(e.getMessage());
+                       sm =  ParserChunker2MatcherProcessor.getInstance();
+               } catch (RuntimeException e){
+                       LOG.error(e.getMessage(), e);
                }
                List<HitBase> newHitList = new ArrayList<>();
 
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/WordDocBuilderEndNotes.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/WordDocBuilderEndNotes.java
index afe37fc..dcda0ce 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/WordDocBuilderEndNotes.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/solr/WordDocBuilderEndNotes.java
@@ -16,15 +16,11 @@
  */
 package opennlp.tools.similarity.apps.solr;
 
-
 import java.io.File;
 import java.math.BigInteger;
 import java.util.ArrayList;
 import java.util.List;
 
-import javax.xml.bind.JAXBException;
-
-import org.apache.commons.lang.StringUtils;
 import org.docx4j.XmlUtils;
 import org.docx4j.jaxb.Context;
 import org.docx4j.openpackaging.exceptions.InvalidFormatException;
@@ -69,7 +65,7 @@ public class WordDocBuilderEndNotes extends 
WordDocBuilderSingleImageSearchCall{
                                        String processedParaTitle = 
processParagraphTitle(para.getTitle());
                                        
                                        if (processedParaTitle!=null && 
-                                                       
!processedParaTitle.endsWith("..") || 
StringUtils.isAlphanumeric(processedParaTitle)){
+                                                       
!processedParaTitle.endsWith("..") || 
processedParaTitle.chars().allMatch(this::isAlphanumeric)){
                                                
wordMLPackage.getMainDocumentPart().addStyledParagraphOfText("Subtitle",processedParaTitle);
                                        }
                                        String paraText = 
processParagraphText(para.getFragments().toString());
@@ -85,7 +81,7 @@ public class WordDocBuilderEndNotes extends 
WordDocBuilderSingleImageSearchCall{
                                                "<w:rStyle 
w:val=\"EndnoteReference\"/></w:rPr><w:endnoteRef/></w:r><w:r><w:t 
xml:space=\"preserve\"> "+ url + "</w:t></w:r></w:p>";
                                 try {
                                                
endnote.getEGBlockLevelElts().add( XmlUtils.unmarshalString(endnoteBody));
-                                       } catch (JAXBException e) {
+                                       } catch (Exception e) {
                                                e.printStackTrace();
                                        }
                                 
@@ -95,7 +91,7 @@ public class WordDocBuilderEndNotes extends 
WordDocBuilderSingleImageSearchCall{
                                 
                                 try {
                                         
wordMLPackage.getMainDocumentPart().addParagraph(docBody);
-                                       } catch (JAXBException e) {
+                                       } catch (Exception e) {
                                                e.printStackTrace();
                                        }
                                        
@@ -172,20 +168,25 @@ public class WordDocBuilderEndNotes extends 
WordDocBuilderSingleImageSearchCall{
                return bestPart;
        }
 
+       private boolean isAlphanumeric(final int codePoint) {
+               return (codePoint >= 65 && codePoint <= 90) ||
+                                               (codePoint >= 97 && codePoint 
<= 122) ||
+                                               (codePoint >= 48 && codePoint 
<= 57);
+       }
     
-    public static void main(String[] args){
-       WordDocBuilderEndNotes b = new WordDocBuilderEndNotes();
-       List<HitBase> content = new ArrayList<>();
-       for(int i = 0; i<10; i++){
-               HitBase h = new HitBase();
-               h.setTitle("albert einstein "+i);
-               List<Fragment> frs = new ArrayList<>();
-               frs.add(new Fragment(" content "+i, 0));
-               h.setFragments(frs);
-               h.setUrl("http://www."+i+".com";);
-               content.add(h);
-       }
-       
-       b.buildWordDoc(content, "albert einstein");
-    }
+       public static void main(String[] args){
+               WordDocBuilderEndNotes b = new WordDocBuilderEndNotes();
+               List<HitBase> content = new ArrayList<>();
+               for(int i = 0; i<10; i++){
+                       HitBase h = new HitBase();
+                       h.setTitle("albert einstein "+i);
+                       List<Fragment> frs = new ArrayList<>();
+                       frs.add(new Fragment(" content "+i, 0));
+                       h.setFragments(frs);
+                       h.setUrl("http://www."+i+".com";);
+                       content.add(h);
+               }
+
+               b.buildWordDoc(content, "albert einstein");
+       }
 }
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java
index 22dc78b..97eda63 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java
@@ -18,11 +18,7 @@
 
 package opennlp.tools.textsimilarity.chunker2matcher;
 
-import java.io.BufferedInputStream;
-import java.io.File;
-import java.io.FileInputStream;
 import java.io.IOException;
-import java.io.InputStream;
 import java.lang.invoke.MethodHandles;
 import java.util.ArrayList;
 import java.util.HashMap;
@@ -39,18 +35,19 @@ import opennlp.tools.parser.ParserFactory;
 import opennlp.tools.parser.ParserModel;
 import opennlp.tools.postag.POSModel;
 import opennlp.tools.postag.POSTagger;
-import opennlp.tools.postag.POSTaggerME;
+import opennlp.tools.postag.ThreadSafePOSTaggerME;
 import opennlp.tools.sentdetect.SentenceDetector;
-import opennlp.tools.sentdetect.SentenceDetectorME;
 import opennlp.tools.sentdetect.SentenceModel;
+import opennlp.tools.sentdetect.ThreadSafeSentenceDetectorME;
 import opennlp.tools.textsimilarity.LemmaPair;
 import opennlp.tools.textsimilarity.ParseTreeChunk;
 import opennlp.tools.textsimilarity.ParseTreeMatcherDeterministic;
 import opennlp.tools.textsimilarity.SentencePairMatchResult;
 import opennlp.tools.textsimilarity.TextProcessor;
+import opennlp.tools.tokenize.ThreadSafeTokenizerME;
 import opennlp.tools.tokenize.Tokenizer;
-import opennlp.tools.tokenize.TokenizerME;
 import opennlp.tools.tokenize.TokenizerModel;
+import opennlp.tools.util.DownloadUtil;
 import opennlp.tools.util.Span;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -60,11 +57,6 @@ public class ParserChunker2MatcherProcessor {
   private static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   static final int MIN_SENTENCE_LENGTH = 10;
-  private static final String MODEL_DIR_KEY = "nlp.models.dir";
-  // TODO config
-  // this is where resources should live
-  private static String MODEL_DIR=null;
-  private static final String MODEL_DIR_REL = "src/test/resources/models";
   protected static ParserChunker2MatcherProcessor instance;
 
   private SentenceDetector sentenceDetector;
@@ -75,30 +67,6 @@ public class ParserChunker2MatcherProcessor {
   private static final int NUMBER_OF_SECTIONS_IN_SENTENCE_CHUNKS = 5;
   private Map<String, String[][]> sentence_parseObject;
 
-  public SentenceDetector getSentenceDetector() {
-    return sentenceDetector;
-  }
-
-  public void setSentenceDetector(SentenceDetector sentenceDetector) {
-    this.sentenceDetector = sentenceDetector;
-  }
-
-  public Tokenizer getTokenizer() {
-    return tokenizer;
-  }
-
-  public void setTokenizer(Tokenizer tokenizer) {
-    this.tokenizer = tokenizer;
-  }
-
-  public ChunkerME getChunker() {
-    return chunker;
-  }
-
-  public void setChunker(ChunkerME chunker) {
-    this.chunker = chunker;
-  }
-
   @SuppressWarnings("unchecked")
   protected ParserChunker2MatcherProcessor() {
     try {
@@ -108,29 +76,65 @@ public class ParserChunker2MatcherProcessor {
       LOG.warn("parsing cache file does not exist (but should be created)");
       sentence_parseObject = new HashMap<>();
     }
-    if (sentence_parseObject == null)
-      sentence_parseObject = new HashMap<>();
 
     try {
-       if (MODEL_DIR==null || MODEL_DIR.equals("/models")) {
-               String absPath = new File(".").getAbsolutePath();
-               absPath = absPath.substring(0, absPath.length()-1);
-               MODEL_DIR = absPath + MODEL_DIR_REL;
-       }
-       //get full path from constructor
-               
       initializeSentenceDetector();
       initializeTokenizer();
       initializePosTagger();
       initializeParser();
       initializeChunker();
-    } catch (Exception e) { // a typical error when 'model' is not installed
-      LOG.warn("The model can't be read and we rely on cache");
-      LOG.warn("Please put OpenNLP model files in 'src/test/resources' (folder 
'model')");
+    } catch (IOException e) {
+      LOG.warn("A model can't be loaded: {}", e.getMessage());
     }
   }
 
-  // closing the processor, clearing loaded ling models and serializing 
parsing cache
+  protected void initializeSentenceDetector() throws IOException {
+    SentenceModel model = DownloadUtil.downloadModel(
+            "en", DownloadUtil.ModelType.SENTENCE_DETECTOR, 
SentenceModel.class);
+    sentenceDetector = new ThreadSafeSentenceDetectorME(model);
+  }
+
+  protected void initializeTokenizer() throws IOException {
+    TokenizerModel model = DownloadUtil.downloadModel(
+            "en", DownloadUtil.ModelType.TOKENIZER, TokenizerModel.class);
+    tokenizer = new ThreadSafeTokenizerME(model);
+  }
+
+  protected void initializePosTagger() throws IOException {
+    POSModel model = DownloadUtil.downloadModel(
+            "en", DownloadUtil.ModelType.POS, POSModel.class);
+    posTagger = new ThreadSafePOSTaggerME(model);
+  }
+
+  protected void initializeParser() throws IOException {
+    ParserModel model = DownloadUtil.downloadModel(
+            "en", DownloadUtil.ModelType.PARSER, ParserModel.class);
+    parser = ParserFactory.create(model);
+  }
+
+  private void initializeChunker() throws IOException {
+    ChunkerModel model = DownloadUtil.downloadModel(
+            "en", DownloadUtil.ModelType.CHUNKER, ChunkerModel.class);
+    chunker = new ChunkerME(model);
+  }
+
+  public SentenceDetector getSentenceDetector() {
+    return sentenceDetector;
+  }
+
+  public Tokenizer getTokenizer() {
+    return tokenizer;
+  }
+
+  public POSTagger getPOSTagger() {
+    return posTagger;
+  }
+
+  public ChunkerME getChunker() {
+    return chunker;
+  }
+
+  // closing the processor and serializing parsing cache
   public void close() {
     instance = null;
     ParserCacheSerializer.writeObject(sentence_parseObject);
@@ -147,14 +151,6 @@ public class ParserChunker2MatcherProcessor {
 
     return instance;
   }
-  
-  public synchronized static ParserChunker2MatcherProcessor getInstance(String 
fullPathToResources) {
-           MODEL_DIR = fullPathToResources+"/models";
-           if (instance == null)
-             instance = new ParserChunker2MatcherProcessor();
-
-           return instance;
-         }
 
   /**
    * General parsing function, which returns lists of parses for a portion of
@@ -165,7 +161,7 @@ public class ParserChunker2MatcherProcessor {
    * @return lists of parses
    */
   public List<List<Parse>> parseTextNlp(String text) {
-    if (text == null || text.trim().length() == 0)
+    if (text == null || text.trim().isEmpty())
       return null;
 
     List<List<Parse>> textParses = new ArrayList<>(1);
@@ -173,7 +169,7 @@ public class ParserChunker2MatcherProcessor {
     // parse paragraph by paragraph
     String[] paragraphList = splitParagraph(text);
     for (String paragraph : paragraphList) {
-      if (paragraph.length() == 0)
+      if (paragraph.isEmpty())
         continue;
 
       List<Parse> paragraphParses = parseParagraphNlp(paragraph);
@@ -185,7 +181,7 @@ public class ParserChunker2MatcherProcessor {
   }
 
   public List<Parse> parseParagraphNlp(String paragraph) {
-    if (paragraph == null || paragraph.trim().length() == 0)
+    if (paragraph == null || paragraph.trim().isEmpty())
       return null;
 
     // normalize the text before parsing, otherwise, the sentences may not
@@ -197,7 +193,7 @@ public class ParserChunker2MatcherProcessor {
     List<Parse> parseList = new ArrayList<>(sentences.length);
     for (String sentence : sentences) {
       sentence = sentence.trim();
-      if (sentence.length() == 0)
+      if (sentence.isEmpty())
         continue;
 
       Parse sentenceParse = parseSentenceNlp(sentence, false);
@@ -250,9 +246,8 @@ public class ParserChunker2MatcherProcessor {
       List<List<ParseTreeChunk>> singleSentChunks = 
formGroupedPhrasesFromChunksForSentence(sent);
       if (singleSentChunks == null)
         continue;
-      if (listOfChunksAccum.size() < 1) {
-        listOfChunksAccum = new ArrayList<>(
-                singleSentChunks);
+      if (listOfChunksAccum.isEmpty()) {
+        listOfChunksAccum = new ArrayList<>(singleSentChunks);
       } else
         for (int i = 0; i < NUMBER_OF_SECTIONS_IN_SENTENCE_CHUNKS; i++) {
           // make sure not null
@@ -468,7 +463,7 @@ public class ParserChunker2MatcherProcessor {
 
   public static List<List<SentenceNode>> textToSentenceNodes(
       List<List<Parse>> textParses) {
-    if (textParses == null || textParses.size() == 0)
+    if (textParses == null || textParses.isEmpty())
       return null;
 
     List<List<SentenceNode>> textNodes = new ArrayList<>(
@@ -477,18 +472,18 @@ public class ParserChunker2MatcherProcessor {
       List<SentenceNode> paragraphNodes = 
paragraphToSentenceNodes(paragraphParses);
 
       // append paragraph node if any
-      if (paragraphNodes != null && paragraphNodes.size() > 0)
+      if (paragraphNodes != null && !paragraphNodes.isEmpty())
         textNodes.add(paragraphNodes);
     }
 
-    if (textNodes.size() > 0)
+    if (!textNodes.isEmpty())
       return textNodes;
     else
       return null;
   }
 
   public static List<SentenceNode> paragraphToSentenceNodes(List<Parse> 
paragraphParses) {
-    if (paragraphParses == null || paragraphParses.size() == 0)
+    if (paragraphParses == null || paragraphParses.isEmpty())
       return null;
 
     List<SentenceNode> paragraphNodes = new 
ArrayList<>(paragraphParses.size());
@@ -506,7 +501,7 @@ public class ParserChunker2MatcherProcessor {
         paragraphNodes.add(sentenceNode);
     }
 
-    if (paragraphNodes.size() > 0)
+    if (!paragraphNodes.isEmpty())
       return paragraphNodes;
     else
       return null;
@@ -518,10 +513,10 @@ public class ParserChunker2MatcherProcessor {
 
     // convert the OpenNLP Parse to our own tree nodes
     SyntacticTreeNode node = toSyntacticTreeNode(sentenceParse);
-    if ((node == null))
+    if (node == null)
       return null;
-    if (node instanceof SentenceNode)
-      return (SentenceNode) node;
+    if (node instanceof SentenceNode sn)
+      return sn;
     else if (node instanceof PhraseNode) {
       return new SentenceNode("sentence", node.getChildren());
     } else
@@ -575,56 +570,6 @@ public class ParserChunker2MatcherProcessor {
     return tokenizer.tokenize(sentence);
   }
 
-  protected void initializeSentenceDetector() {
-    try (InputStream is = new BufferedInputStream(new 
FileInputStream(MODEL_DIR + "/en-sent.bin"))) {
-      SentenceModel model = new SentenceModel(is);
-      sentenceDetector = new SentenceDetectorME(model);
-    } catch (IOException e) {
-      // we swallow exception to support the cached run
-      LOG.debug(e.getLocalizedMessage(), e);
-    }
-  }
-
-  protected void initializeTokenizer() {
-    try (InputStream is = new BufferedInputStream(new 
FileInputStream(MODEL_DIR + "/en-token.bin"))) {
-      TokenizerModel model = new TokenizerModel(is);
-      tokenizer = new TokenizerME(model);
-    } catch (IOException e) {
-      // we swallow exception to support the cached run
-      LOG.debug(e.getLocalizedMessage(), e);
-    }
-  }
-
-  protected void initializePosTagger() {
-    try (InputStream is = new BufferedInputStream(new 
FileInputStream(MODEL_DIR + "/en-pos-maxent.bin"))) {
-      POSModel model = new POSModel(is);
-      posTagger = new POSTaggerME(model);
-    } catch (IOException e) {
-      // we swallow exception to support the cached run
-      LOG.debug(e.getLocalizedMessage(), e);
-    }
-  }
-
-  protected void initializeParser() {
-    try (InputStream is = new BufferedInputStream(new 
FileInputStream(MODEL_DIR + "/en-parser-chunking.bin"))) {
-      ParserModel model = new ParserModel(is);
-      parser = ParserFactory.create(model);
-    } catch (IOException e) {
-      // we swallow exception to support the cached run
-      LOG.debug(e.getLocalizedMessage(), e);
-    }
-  }
-
-  private void initializeChunker() {
-    try (InputStream is = new BufferedInputStream(new 
FileInputStream(MODEL_DIR + "/en-chunker.bin"))) {
-      ChunkerModel model = new ChunkerModel(is);
-      chunker = new ChunkerME(model);
-    } catch (IOException e) {
-      // we swallow exception to support the cached run
-      LOG.debug(e.getLocalizedMessage(), e);
-    }
-  }
-
   /**
    * convert an instance of Parse to SyntacticTreeNode, by filtering out the
    * unnecessary data and assigning the word for each node
@@ -641,11 +586,11 @@ public class ParserChunker2MatcherProcessor {
       return null;
 
     String text = parse.getText();
-    ArrayList<SyntacticTreeNode> childrenNodeList = 
convertChildrenNodes(parse);
+    List<SyntacticTreeNode> childrenNodeList = convertChildrenNodes(parse);
 
     // check sentence node, the node contained in the top node
     if (type.equals(AbstractBottomUpParser.TOP_NODE)
-        && childrenNodeList != null && childrenNodeList.size() > 0) {
+        && childrenNodeList != null && !childrenNodeList.isEmpty()) {
       PhraseNode rootNode;
     try {
       rootNode = (PhraseNode) childrenNodeList.get(0);
@@ -656,7 +601,7 @@ public class ParserChunker2MatcherProcessor {
     }
 
     // if this node contains children nodes, then it is a phrase node
-    if (childrenNodeList != null && childrenNodeList.size() > 0) {
+    if (childrenNodeList != null && !childrenNodeList.isEmpty()) {
       // System.out.println("Found "+ type + " phrase = "+ childrenNodeList);
       return new PhraseNode(type, childrenNodeList);
 
@@ -669,7 +614,7 @@ public class ParserChunker2MatcherProcessor {
     return new WordNode(type, word);
   }
 
-  private static ArrayList<SyntacticTreeNode> convertChildrenNodes(Parse 
parse) {
+  private static List<SyntacticTreeNode> convertChildrenNodes(Parse parse) {
     if (parse == null)
       return null;
 
@@ -677,7 +622,7 @@ public class ParserChunker2MatcherProcessor {
     if (children == null || children.length == 0)
       return null;
 
-    ArrayList<SyntacticTreeNode> childrenNodeList = new ArrayList<>();
+    List<SyntacticTreeNode> childrenNodeList = new ArrayList<>();
     for (Parse child : children) {
       SyntacticTreeNode childNode = toSyntacticTreeNode(child);
       if (childNode != null)
@@ -711,7 +656,7 @@ public class ParserChunker2MatcherProcessor {
   protected List<LemmaPair> listListParseTreeChunk2ListLemmaPairs(
       List<List<ParseTreeChunk>> sent1GrpLst) {
     List<LemmaPair> results = new ArrayList<>();
-    if (sent1GrpLst == null || sent1GrpLst.size() < 1)
+    if (sent1GrpLst == null || sent1GrpLst.isEmpty())
       return results;
     List<ParseTreeChunk> wholeSentence = sent1GrpLst
         .get(sent1GrpLst.size() - 1); // whole sentence is last list in the 
list
diff --git 
a/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserPure2MatcherProcessor.java
 
b/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserPure2MatcherProcessor.java
index 2e21705..c5e5dca 100644
--- 
a/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserPure2MatcherProcessor.java
+++ 
b/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserPure2MatcherProcessor.java
@@ -33,9 +33,13 @@
 
 package opennlp.tools.textsimilarity.chunker2matcher;
 
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
 import java.util.ArrayList;
 import java.util.List;
-import java.util.logging.Logger;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import opennlp.tools.textsimilarity.LemmaPair;
 import opennlp.tools.textsimilarity.ParseTreeChunk;
@@ -44,9 +48,10 @@ import opennlp.tools.textsimilarity.SentencePairMatchResult;
 import opennlp.tools.textsimilarity.TextProcessor;
 
 public class ParserPure2MatcherProcessor extends 
ParserChunker2MatcherProcessor {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
   protected static ParserPure2MatcherProcessor pinstance;
-  private static final Logger LOG = Logger
-      
.getLogger("opennlp.tools.textsimilarity.chunker2matcher.ParserPure2MatcherProcessor");
 
   public synchronized static ParserPure2MatcherProcessor getInstance() {
     if (pinstance == null)
@@ -56,10 +61,14 @@ public class ParserPure2MatcherProcessor extends 
ParserChunker2MatcherProcessor
   }
 
   private ParserPure2MatcherProcessor() {
-    initializeSentenceDetector();
-    initializeTokenizer();
-    initializePosTagger();
-    initializeParser();
+    try {
+      initializeSentenceDetector();
+      initializeTokenizer();
+      initializePosTagger();
+      initializeParser();
+    } catch (IOException e) {
+      LOG.warn("A model can't be loaded: {}", e.getMessage());
+    }
   }
 
   public synchronized List<List<ParseTreeChunk>> 
formGroupedPhrasesFromChunksForSentence(
@@ -70,7 +79,7 @@ public class ParserPure2MatcherProcessor extends 
ParserChunker2MatcherProcessor
     sentence = TextProcessor.removePunctuation(sentence);
     SentenceNode node = parseSentenceNode(sentence);
     if (node == null) {
-      LOG.info("Problem parsing sentence '" + sentence);
+      LOG.info("Problem parsing sentence '{}'", sentence);
       return null;
     }
     List<ParseTreeChunk> ptcList = node.getParseTreeChunkList();
@@ -78,7 +87,8 @@ public class ParserPure2MatcherProcessor extends 
ParserChunker2MatcherProcessor
     List<String> TokList = node.getOrderedLemmaList();
 
     List<List<ParseTreeChunk>> listOfChunks = new ArrayList<>();
-    List<ParseTreeChunk> nounPhr = new ArrayList<>(), prepPhr = new 
ArrayList<>(), verbPhr = new ArrayList<>(), adjPhr = new ArrayList<>(),
+    List<ParseTreeChunk> nounPhr = new ArrayList<>(), prepPhr = new 
ArrayList<>(),
+                         verbPhr = new ArrayList<>(), adjPhr = new 
ArrayList<>(),
     // to store the whole sentence
     wholeSentence = new ArrayList<>();
 
@@ -112,11 +122,7 @@ public class ParserPure2MatcherProcessor extends 
ParserChunker2MatcherProcessor
 
     List<List<ParseTreeChunk>> sent1GrpLst = 
formGroupedPhrasesFromChunksForPara(para1), sent2GrpLst = 
formGroupedPhrasesFromChunksForPara(para2);
 
-    List<LemmaPair> origChunks1 = 
listListParseTreeChunk2ListLemmaPairs(sent1GrpLst); // TODO
-                                                                               
       // need
-                                                                               
       // to
-                                                                               
       // populate
-                                                                               
       // it!
+    List<LemmaPair> origChunks1 = 
listListParseTreeChunk2ListLemmaPairs(sent1GrpLst);
 
     ParseTreeMatcherDeterministic md = new ParseTreeMatcherDeterministic();
     List<List<ParseTreeChunk>> res = md
@@ -126,16 +132,13 @@ public class ParserPure2MatcherProcessor extends 
ParserChunker2MatcherProcessor
   }
 
   public static void main(String[] args) throws Exception {
-    ParserPure2MatcherProcessor parser = ParserPure2MatcherProcessor
-        .getInstance();
+    ParserPure2MatcherProcessor parser = 
ParserPure2MatcherProcessor.getInstance();
     String text = "Its classy design and the Mercedes name make it a very cool 
vehicle to drive. ";
 
     List<List<ParseTreeChunk>> res = parser
         .formGroupedPhrasesFromChunksForPara(text);
     System.out.println(res);
 
-    // System.exit(0);
-
     String phrase1 = "Its classy design and the Mercedes name make it a very 
cool vehicle to drive. "
         + "The engine makes it a powerful car. "
         + "The strong engine gives it enough power. "
@@ -145,18 +148,15 @@ public class ParserPure2MatcherProcessor extends 
ParserChunker2MatcherProcessor
         + "This car provides you a very good mileage.";
     String sentence = "Not to worry with the 2cv.";
 
-    System.out.println(parser.assessRelevance(phrase1, phrase2)
-        .getMatchResult());
-
-    System.out
-        .println(parser
-            .formGroupedPhrasesFromChunksForSentence("Its classy design and 
the Mercedes name make it a very cool vehicle to drive. "));
-    System.out
-        .println(parser
-            .formGroupedPhrasesFromChunksForSentence("Sounds too good to be 
true but it actually is, the world's first flying car is finally here. "));
-    System.out
-        .println(parser
-            .formGroupedPhrasesFromChunksForSentence("UN Ambassador Ron Prosor 
repeated the Israeli position that the only way the Palestinians will get UN 
membership and statehood is through direct negotiations with the Israelis on a 
comprehensive peace agreement"));
+    System.out.println(parser.assessRelevance(phrase1, 
phrase2).getMatchResult());
+
+    System.out.println(parser.formGroupedPhrasesFromChunksForSentence(
+            "Its classy design and the Mercedes name make it a very cool 
vehicle to drive. "));
+    System.out.println(parser.formGroupedPhrasesFromChunksForSentence(
+            "Sounds too good to be true but it actually is, the world's first 
flying car is finally here. "));
+    System.out.println(parser.formGroupedPhrasesFromChunksForSentence(
+            "UN Ambassador Ron Prosor repeated the Israeli position that the 
only way the Palestinians will get " +
+            "UN membership and statehood is through direct negotiations with 
the Israelis on a comprehensive peace agreement"));
 
   }
 }
diff --git a/opennlp-similarity/src/test/resources/models/en-sent.bin 
b/opennlp-similarity/src/test/resources/models/en-sent.bin
deleted file mode 100644
index e89076b..0000000
Binary files a/opennlp-similarity/src/test/resources/models/en-sent.bin and 
/dev/null differ
diff --git a/pom.xml b/pom.xml
index c2f4a52..e98b18d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -158,22 +158,38 @@
                 <artifactId>slf4j-api</artifactId>
                 <version>${slf4j.version}</version>
             </dependency>
+            <dependency>
+                <groupId>org.slf4j</groupId>
+                <artifactId>log4j-over-slf4j</artifactId>
+                <version>${slf4j.version}</version>
+                <scope>runtime</scope>
+            </dependency>
 
             <dependency>
                 <groupId>commons-lang</groupId>
                 <artifactId>commons-lang</artifactId>
                 <version>2.6</version>
             </dependency>
+            <dependency>
+                <groupId>commons-io</groupId>
+                <artifactId>commons-io</artifactId>
+                <version>2.18.0</version>
+            </dependency>
             <dependency>
                 <groupId>org.apache.commons</groupId>
                 <artifactId>commons-lang3</artifactId>
-                <version>3.12.0</version>
+                <version>3.17.0</version>
             </dependency>
             <dependency>
                 <groupId>commons-codec</groupId>
                 <artifactId>commons-codec</artifactId>
                 <version>1.15</version>
             </dependency>
+            <dependency>
+                <groupId>org.apache.commons</groupId>
+                <artifactId>commons-mat3</artifactId>
+                <version>3.6.1</version>
+            </dependency>
             <dependency>
                 <groupId>commons-logging</groupId>
                 <artifactId>commons-logging</artifactId>

Reply via email to