mawiesne commented on code in PR #197: URL: https://github.com/apache/opennlp-sandbox/pull/197#discussion_r1887057418
########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: Review Comment: server -> lower case "s" options -> lower case "o" ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). Review Comment: I don't understand "report" here. Should that be "to bind to"? 2nd Q: If ipv4 (or ipv6) is supported, a hint would be fine here for callers. ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: + +If using a configuration file, it should be in the format: + +```bash +# Set to true to enable gRPC server reflection. +server.enable_reflection = false + +# This is the folder to be used to search for models +model.location=extlib +# Set this to true to recursively search for models inside the model.location folder. +model.recursive=true +# A wildcard to search for models in the model.location folder. +model.pos.wildcard=opennlp-models-pos-*.jar +``` + +#### Models + +To ensure the server automatically loads models, they must be placed in the `extlib` (or in the location configured via `model.location`) directory. + +## Building a Custom Client in another Programming Language + +Details can be found in the README of the opennlp-grpc-api module. Review Comment: Is it possible to use a link here? (for the README of the other module) ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: + +If using a configuration file, it should be in the format: + +```bash +# Set to true to enable gRPC server reflection. +server.enable_reflection = false + +# This is the folder to be used to search for models +model.location=extlib +# Set this to true to recursively search for models inside the model.location folder. +model.recursive=true +# A wildcard to search for models in the model.location folder. Review Comment: add "pattern" after "wildcard" here? ########## opennlp-grpc/opennlp-grpc-api/opennlp.proto: ########## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +syntax = "proto3"; + +option java_package = "opennlp"; +option java_outer_classname = "OpenNLPService"; +package opennlp; + +service PosTaggerService { + // Assigns the sentence of tokens pos tags. Review Comment: "pos" should be capitalized here: "POS" tags, as written below (in another comment) ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/PosTaggerService.java: ########## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.nio.file.Path; +import java.util.Arrays; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; + +import com.google.rpc.Code; +import com.google.rpc.Status; +import io.grpc.protobuf.StatusProto; +import io.grpc.stub.StreamObserver; +import org.slf4j.LoggerFactory; + +import opennlp.OpenNLPService; +import opennlp.PosTaggerServiceGrpc; +import opennlp.service.classpath.DirectoryModelFinder; +import opennlp.service.exception.ServiceException; +import opennlp.tools.commons.ThreadSafe; +import opennlp.tools.models.ClassPathModel; +import opennlp.tools.models.ClassPathModelEntry; +import opennlp.tools.models.ClassPathModelLoader; +import opennlp.tools.postag.POSModel; +import opennlp.tools.postag.POSTagFormat; +import opennlp.tools.postag.POSTagger; +import opennlp.tools.postag.ThreadSafePOSTaggerME; + +/** + * The {@code PosTaggerService} class implements a gRPC service for Part-of-Speech (POS) tagging + * using Apache OpenNLP models. It extends the auto-generated gRPC base class + * {@link PosTaggerServiceGrpc.PosTaggerServiceImplBase}. + * + * <p>This service provides functionality for: + * <ul> + * <li>Retrieving available POS models loaded from the classpath.</li> + * <li>Performing POS tagging on input sentences.</li> + * <li>Performing POS tagging with additional context.</li> + * </ul> + * </p> + * + * <p><b>Configuration:</b> + * <ul> + * <li>{@code model.location}: Directory to search for models (default: "extlib").</li> + * <li>{@code model.recursive}: Whether to scan subdirectories (default: {@code true}).</li> + * <li>{@code model.pos.wildcard}: Wildcard pattern to identify POS models (default: "opennlp-models-pos-*.jar").</li> + * </ul> + * </p> + */ +@ThreadSafe +public class PosTaggerService extends PosTaggerServiceGrpc.PosTaggerServiceImplBase { + + private static final org.slf4j.Logger logger = + LoggerFactory.getLogger(PosTaggerService.class); + + private static final Map<String, ClassPathModel> MODEL_CACHE = new ConcurrentHashMap<>(); + private static final Map<String, POSTagger> TAGGER_CACHE = new ConcurrentHashMap<>(); + + public PosTaggerService(Map<String, String> conf) { + + try { + initializeModelCache(conf); + } catch (IOException e) { + logger.error(e.getLocalizedMessage(), e); + throw new RuntimeException(e); + } + + } + + public static void clearCaches() { + synchronized (TAGGER_CACHE) { + for (POSTagger t : TAGGER_CACHE.values()) { + if (t instanceof AutoCloseable a) { + try { + a.close(); + } catch (Exception ignored) { + + } + } + } + TAGGER_CACHE.clear(); + MODEL_CACHE.clear(); + } + } + + private void initializeModelCache(Map<String, String> conf) throws IOException { + final String modelDir = conf.getOrDefault("model.location", "extlib"); + final boolean recursive = Boolean.parseBoolean(conf.getOrDefault("model.recursive", "true")); + final String wildcardPattern = conf.getOrDefault("model.pos.wildcard", "opennlp-models-pos-*.jar"); + + final DirectoryModelFinder finder = new DirectoryModelFinder(wildcardPattern, Path.of(modelDir), recursive); + final ClassPathModelLoader loader = new ClassPathModelLoader(); + + final Set<ClassPathModelEntry> models = finder.findModels(false); + for (ClassPathModelEntry entry : models) { + final ClassPathModel model = loader.load(entry); + if (model != null) { + MODEL_CACHE.putIfAbsent(model.getModelSHA256(), model); + } + } + + } + + @Override + public void getAvailableModels(opennlp.OpenNLPService.Empty request, + io.grpc.stub.StreamObserver<opennlp.OpenNLPService.AvailableModels> responseObserver) { + + try { + final OpenNLPService.AvailableModels.Builder response = OpenNLPService.AvailableModels.newBuilder(); + for (ClassPathModel model : MODEL_CACHE.values()) { + final OpenNLPService.Model m = OpenNLPService.Model.newBuilder() + .setHash(model.getModelSHA256()) + .setName(model.getModelName()) + .setLocale(model.getModelLanguage()) + .build(); + + response.addModels(m); + + } + + responseObserver.onNext(response.build()); + responseObserver.onCompleted(); + } catch (Exception e) { + handleException(e, responseObserver); + } + } + + @Override + public void tag(opennlp.OpenNLPService.TagRequest request, + io.grpc.stub.StreamObserver<opennlp.OpenNLPService.PosTags> responseObserver) { + try { + final POSTagger tagger = getTagger(request.getModelHash(), request.getFormat()); + final String[] tags = tagger.tag(request.getSentenceList().toArray(new String[0])); + responseObserver.onNext(OpenNLPService.PosTags.newBuilder().addAllTags(Arrays.asList(tags)).build()); + responseObserver.onCompleted(); + } catch (Exception e) { + handleException(e, responseObserver); + } + + } + + @Override + public void tagWithContext(opennlp.OpenNLPService.TagWithContextRequest request, + io.grpc.stub.StreamObserver<opennlp.OpenNLPService.PosTags> responseObserver) { + + try { + final POSTagger tagger = getTagger(request.getModelHash(), request.getFormat()); + final String[] tags = tagger.tag(request.getSentenceList().toArray(new String[0]), request.getAdditionalContextList().toArray(new String[0])); Review Comment: Line is too long, pls reformat to increase readability. ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/PosTaggerService.java: ########## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.nio.file.Path; +import java.util.Arrays; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; + +import com.google.rpc.Code; +import com.google.rpc.Status; +import io.grpc.protobuf.StatusProto; +import io.grpc.stub.StreamObserver; +import org.slf4j.LoggerFactory; + +import opennlp.OpenNLPService; +import opennlp.PosTaggerServiceGrpc; +import opennlp.service.classpath.DirectoryModelFinder; +import opennlp.service.exception.ServiceException; +import opennlp.tools.commons.ThreadSafe; +import opennlp.tools.models.ClassPathModel; +import opennlp.tools.models.ClassPathModelEntry; +import opennlp.tools.models.ClassPathModelLoader; +import opennlp.tools.postag.POSModel; +import opennlp.tools.postag.POSTagFormat; +import opennlp.tools.postag.POSTagger; +import opennlp.tools.postag.ThreadSafePOSTaggerME; + +/** + * The {@code PosTaggerService} class implements a gRPC service for Part-of-Speech (POS) tagging + * using Apache OpenNLP models. It extends the auto-generated gRPC base class + * {@link PosTaggerServiceGrpc.PosTaggerServiceImplBase}. + * + * <p>This service provides functionality for: + * <ul> + * <li>Retrieving available POS models loaded from the classpath.</li> + * <li>Performing POS tagging on input sentences.</li> + * <li>Performing POS tagging with additional context.</li> + * </ul> + * </p> + * + * <p><b>Configuration:</b> + * <ul> + * <li>{@code model.location}: Directory to search for models (default: "extlib").</li> + * <li>{@code model.recursive}: Whether to scan subdirectories (default: {@code true}).</li> + * <li>{@code model.pos.wildcard}: Wildcard pattern to identify POS models (default: "opennlp-models-pos-*.jar").</li> + * </ul> + * </p> + */ +@ThreadSafe +public class PosTaggerService extends PosTaggerServiceGrpc.PosTaggerServiceImplBase { + + private static final org.slf4j.Logger logger = + LoggerFactory.getLogger(PosTaggerService.class); + + private static final Map<String, ClassPathModel> MODEL_CACHE = new ConcurrentHashMap<>(); + private static final Map<String, POSTagger> TAGGER_CACHE = new ConcurrentHashMap<>(); + + public PosTaggerService(Map<String, String> conf) { + + try { + initializeModelCache(conf); + } catch (IOException e) { + logger.error(e.getLocalizedMessage(), e); + throw new RuntimeException(e); + } + + } + + public static void clearCaches() { + synchronized (TAGGER_CACHE) { + for (POSTagger t : TAGGER_CACHE.values()) { + if (t instanceof AutoCloseable a) { + try { + a.close(); + } catch (Exception ignored) { + + } + } + } + TAGGER_CACHE.clear(); + MODEL_CACHE.clear(); + } + } + + private void initializeModelCache(Map<String, String> conf) throws IOException { + final String modelDir = conf.getOrDefault("model.location", "extlib"); + final boolean recursive = Boolean.parseBoolean(conf.getOrDefault("model.recursive", "true")); + final String wildcardPattern = conf.getOrDefault("model.pos.wildcard", "opennlp-models-pos-*.jar"); + + final DirectoryModelFinder finder = new DirectoryModelFinder(wildcardPattern, Path.of(modelDir), recursive); + final ClassPathModelLoader loader = new ClassPathModelLoader(); + + final Set<ClassPathModelEntry> models = finder.findModels(false); + for (ClassPathModelEntry entry : models) { + final ClassPathModel model = loader.load(entry); + if (model != null) { + MODEL_CACHE.putIfAbsent(model.getModelSHA256(), model); + } + } + + } + + @Override + public void getAvailableModels(opennlp.OpenNLPService.Empty request, + io.grpc.stub.StreamObserver<opennlp.OpenNLPService.AvailableModels> responseObserver) { + + try { + final OpenNLPService.AvailableModels.Builder response = OpenNLPService.AvailableModels.newBuilder(); + for (ClassPathModel model : MODEL_CACHE.values()) { + final OpenNLPService.Model m = OpenNLPService.Model.newBuilder() + .setHash(model.getModelSHA256()) + .setName(model.getModelName()) + .setLocale(model.getModelLanguage()) + .build(); + + response.addModels(m); + + } + + responseObserver.onNext(response.build()); + responseObserver.onCompleted(); + } catch (Exception e) { + handleException(e, responseObserver); + } + } + + @Override + public void tag(opennlp.OpenNLPService.TagRequest request, + io.grpc.stub.StreamObserver<opennlp.OpenNLPService.PosTags> responseObserver) { + try { + final POSTagger tagger = getTagger(request.getModelHash(), request.getFormat()); + final String[] tags = tagger.tag(request.getSentenceList().toArray(new String[0])); + responseObserver.onNext(OpenNLPService.PosTags.newBuilder().addAllTags(Arrays.asList(tags)).build()); + responseObserver.onCompleted(); + } catch (Exception e) { + handleException(e, responseObserver); + } + + } + + @Override + public void tagWithContext(opennlp.OpenNLPService.TagWithContextRequest request, + io.grpc.stub.StreamObserver<opennlp.OpenNLPService.PosTags> responseObserver) { + + try { + final POSTagger tagger = getTagger(request.getModelHash(), request.getFormat()); + final String[] tags = tagger.tag(request.getSentenceList().toArray(new String[0]), request.getAdditionalContextList().toArray(new String[0])); + responseObserver.onNext(OpenNLPService.PosTags.newBuilder().addAllTags(Arrays.asList(tags)).build()); + responseObserver.onCompleted(); + } catch (Exception e) { + handleException(e, responseObserver); + } + } + + private void handleException(Exception e, StreamObserver<?> responseObserver) { + final Status status = Status.newBuilder() + .setCode(Code.INTERNAL.getNumber()) + .setMessage(e.getLocalizedMessage()) + .build(); + responseObserver.onError(StatusProto.toStatusRuntimeException(status)); + } + + private POSTagger getTagger(String hash, OpenNLPService.POSTagFormat posTagFormat) { + final POSTagFormat format = (posTagFormat == null) ? POSTagFormat.UD : POSTagFormat.valueOf(posTagFormat.name()); + + return TAGGER_CACHE.computeIfAbsent((hash + "-" + format), modelHash -> { + final ClassPathModel model = MODEL_CACHE.get(modelHash); + + if (model == null) { + throw new ServiceException("Could not find the given model."); + } + + try { + return new ThreadSafePOSTaggerME(new POSModel(new ByteArrayInputStream(model.model())), format); Review Comment: Should additionally be wrapped in a `BufferedInputStream` to speed up reading of larger models. ########## opennlp-grpc/pom.xml: ########## @@ -0,0 +1,50 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xmlns="http://maven.apache.org/POM/4.0.0" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.opennlp</groupId> + <artifactId>opennlp-sandbox</artifactId> + <version>2.5.2-SNAPSHOT</version> + </parent> + + <artifactId>opennlp-grpc</artifactId> + <name>Apache OpenNLP gRPC Server</name> + <packaging>pom</packaging> + + <modules> + <module>opennlp-grpc-api</module> + <module>opennlp-grpc-service</module> + </modules> + + <properties> + <grpc.version>1.69.0</grpc.version> + <opennlp.version>2.5.1</opennlp.version> Review Comment: That property should be defined already via the parent sandbox pom.xml file? same with junit and slf4j, log4j2. Why don't we re-use this, when this pom.xml file points to a parent that has it already defined? ########## opennlp-grpc/opennlp-grpc-service/src/test/resources/models/opennlp-models-pos-en-1.2.0.jar: ########## Review Comment: Can we avoid adding (larger) binary model resources to the /src/test/resources/models folder? This is a liability for the future and means extra cost in transfer (scm checkouts) and disk space requirements. ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). Review Comment: should read: "will listen on" <-- + "on" ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: + +If using a configuration file, it should be in the format: + +```bash +# Set to true to enable gRPC server reflection. +server.enable_reflection = false + +# This is the folder to be used to search for models Review Comment: "search" -> better: "scan for..." or "check for" ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: + +If using a configuration file, it should be in the format: + +```bash +# Set to true to enable gRPC server reflection. Review Comment: Is "server reflection" clear to readers? What is the effect of enabling this feature? ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: + +If using a configuration file, it should be in the format: + +```bash +# Set to true to enable gRPC server reflection. +server.enable_reflection = false + +# This is the folder to be used to search for models +model.location=extlib +# Set this to true to recursively search for models inside the model.location folder. Review Comment: remove "this" (2nd token) to be consistent with the other parameter description above ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: Review Comment: configuration -> lower case "c" file -> lower case "f" ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: Review Comment: server -> lower case "s" ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: + +If using a configuration file, it should be in the format: + +```bash +# Set to true to enable gRPC server reflection. +server.enable_reflection = false + +# This is the folder to be used to search for models +model.location=extlib +# Set this to true to recursively search for models inside the model.location folder. +model.recursive=true +# A wildcard to search for models in the model.location folder. +model.pos.wildcard=opennlp-models-pos-*.jar +``` + +#### Models + +To ensure the server automatically loads models, they must be placed in the `extlib` (or in the location configured via `model.location`) directory. Review Comment: "they" should read "these" ########## opennlp-grpc/examples/python-client/main.py: ########## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import grpc + +import opennlp_pb2 +import opennlp_pb2_grpc + +# Define the server address and port +server_address = 'localhost:7071' + +# Create a channel and a stub (client) +with grpc.insecure_channel(server_address) as channel: + stub = opennlp_pb2_grpc.PosTaggerServiceStub(channel) + + try: + empty_request = opennlp_pb2.Empty() + available_models_response = stub.GetAvailableModels(empty_request) + print(f"Available POS Models: {available_models_response.models}") + except grpc.RpcError as e: + print(f"GetAvailableModels call failed: {e}") + + # Call the 'Tag' method + try: + # Construct the TagRequest object + tag_request = opennlp_pb2.TagRequest( + sentence=['The', 'driver', 'got', 'badly', 'injured', 'by', 'the', 'accident', '.'], # Sentence tokens + format=opennlp_pb2.POSTagFormat.UD, # Use the enum for UD format + model_hash='cb219de4e5fc8c3c6d61531ac2dff0b186f6c3f457207359ec28e9336311ef8e' # Model name Review Comment: "# Model name" should read "# Model hash" ? ########## opennlp-grpc/README.md: ########## @@ -0,0 +1,96 @@ +# OpenNLP gRPC - Proof of Concept + +This project demonstrates a proof of concept for creating a backend powered by Apache OpenNLP using gRPC. It comprises +three main modules: + +- **opennlp-grpc-api** +- **opennlp-grpc-service** +- **examples** + +## Modules Overview + +1. **opennlp-grpc-api**: + - Contains the gRPC schema for OpenNLP services. + - Includes generated Java stubs. + - Provides a README with instructions on generating code stubs for various languages and auto-generated + documentation. + +2. **opennlp-grpc-service**: + - Features a server implementation. + - Offers an initial service implementation for POS tagging. + +3. **examples**: + - Provides a sample implementation for interacting with the OpenNLP server backend via gRPC in Python. + +## Getting Started + +Follow these steps to set up and run the OpenNLP gRPC proof of concept project: + +### Prerequisites +Before you begin, ensure you have the following installed on your system: + +- Java Development Kit (JDK) 17 or later +- Apache Maven (for building Java components) +- Docker for running the gRPC tools if modifications to the .proto files are needed + +You can build the project by running + +``` +mvn clean install +``` + +### Running the gRPC Backend + +Start the Server: Use the following command to run the server with default settings: + +```bash +java -jar target/opennlp-grpc-server-2.5.2-SNAPSHOT.jar +``` + +Configure Server Options: + +The server supports several command-line options for customization: + +```bash +-p or --port: Port on which the server will listen (default: 7071). +-h or --hostname: Hostname to report (default: localhost). +-c or --config: Path to a configuration file. +``` + +Example with custom options: + +```bash +java -jar target/opennlp-grpc-server-1.0-SNAPSHOT.jar -p 8080 -h 127.0.0.1 -c ./server-config.ini +``` + +Sample Configuration File: + +If using a configuration file, it should be in the format: + +```bash +# Set to true to enable gRPC server reflection. +server.enable_reflection = false + +# This is the folder to be used to search for models +model.location=extlib +# Set this to true to recursively search for models inside the model.location folder. +model.recursive=true +# A wildcard to search for models in the model.location folder. +model.pos.wildcard=opennlp-models-pos-*.jar +``` + +#### Models + +To ensure the server automatically loads models, they must be placed in the `extlib` (or in the location configured via `model.location`) directory. + +## Building a Custom Client in another Programming Language + +Details can be found in the README of the opennlp-grpc-api module. + +## Supported Features + +Currently, the server supports the following features: + +- POS Tagging Review Comment: Should we add in round brackets: "(using the Universal Dependencies tag format)" ? Could help users to understand that Penn tag set is kind of deprecated with the newer models. ########## opennlp-grpc/opennlp-grpc-api/README.md: ########## @@ -0,0 +1,53 @@ +# Apache OpenNLP gRPC API + +This module contains the [gRPC](https://grpc.io) schema used in Apache OpenNLP to provide a service side gRPC backend. + +An automatically generated overview of the endpoints and messages can be found [here](opennlp) + +# Main concepts + +The endpoints and messages described by the API are meant to be a minimum. +It does not support every feature of Apache OpenNLP at the moment, but is open for enhancement or further improvement. Review Comment: plural for "improvement"_s, as we want many of those :) ########## opennlp-grpc/opennlp-grpc-api/README.md: ########## @@ -0,0 +1,53 @@ +# Apache OpenNLP gRPC API + +This module contains the [gRPC](https://grpc.io) schema used in Apache OpenNLP to provide a service side gRPC backend. + +An automatically generated overview of the endpoints and messages can be found [here](opennlp) + +# Main concepts + +The endpoints and messages described by the API are meant to be a minimum. +It does not support every feature of Apache OpenNLP at the moment, but is open for enhancement or further improvement. + +# Maven dependencies + +The Java code generated from the schema is available as a Maven dependency. + +``` + <dependencies> Review Comment: No need to use `<dependencies>` and `</dependencies>` for this snippet. It's kind of bloatish. ########## opennlp-grpc/opennlp-grpc-api/opennlp.proto: ########## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +syntax = "proto3"; + +option java_package = "opennlp"; +option java_outer_classname = "OpenNLPService"; +package opennlp; + +service PosTaggerService { + // Assigns the sentence of tokens pos tags. + rpc Tag(TagRequest) returns (PosTags); + // Assigns the sentence of tokens pos tags with additional (string-based) context. Review Comment: See previous comment. ########## opennlp-grpc/opennlp-grpc-api/opennlp.md: ########## @@ -0,0 +1,106 @@ +# Protocol Documentation + +<a name="top"></a> + +## Table of Contents + +- [postagger.proto](#postagger-proto) + - [AvailableModels](#opennlp-AvailableModels) + - [Empty](#opennlp-Empty) + - [PosTags](#opennlp-PosTags) + - [TagRequest](#opennlp-TagRequest) + - [TagWithContextRequest](#opennlp-TagWithContextRequest) + + - [POSTagFormat](#opennlp-POSTagFormat) + + - [PosTaggerService](#opennlp-PosTaggerService) + +- [Scalar Value Types](#scalar-value-types) + +<a name="postagger-proto"></a> +<p align="right"><a href="#top">Top</a></p> + +## postagger.proto + +<a name="opennlp-AvailableModels"></a> + +### AvailableModels + +| Field | Type | Label | Description | +|--------|-------------------|----------|-------------| +| models | [string](#string) | repeated | | + +<a name="opennlp-Empty"></a> + +### Empty + +<a name="opennlp-PosTags"></a> + +### PosTags + +| Field | Type | Label | Description | +|-------|-------------------|----------|-------------| +| tags | [string](#string) | repeated | | + +<a name="opennlp-TagRequest"></a> + +### TagRequest + +| Field | Type | Label | Description | +|------------|---------------------------------------|----------|-------------| +| sentence | [string](#string) | repeated | | +| format | [POSTagFormat](#opennlp-POSTagFormat) | | | +| model_name | [string](#string) | | | + +<a name="opennlp-TagWithContextRequest"></a> + +### TagWithContextRequest + +| Field | Type | Label | Description | +|--------------------|---------------------------------------|----------|-------------| +| sentence | [string](#string) | repeated | | +| additional_context | [string](#string) | repeated | | +| format | [POSTagFormat](#opennlp-POSTagFormat) | | | +| model_hash | [string](#string) | | | + +<a name="opennlp-POSTagFormat"></a> + +### POSTagFormat + +| Name | Number | Description | +|---------|--------|-------------| +| UD | 0 | | +| PENN | 1 | | Review Comment: The PENN row could have a description / note to indicate that this is a deprecated format and "back translation" from UD -> PENN isn't a loss-free operation. ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/OpenNLPServer.java: ########## @@ -0,0 +1,210 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp; + +import java.io.File; +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.Callable; + +import io.grpc.Server; +import io.grpc.ServerBuilder; +import io.grpc.protobuf.services.ProtoReflectionServiceV1; +import org.slf4j.LoggerFactory; +import picocli.CommandLine; +import picocli.CommandLine.Command; +import picocli.CommandLine.Option; + +import opennlp.service.PosTaggerService; + +/** + * The {@code OpenNLPServer} class implements a gRPC server for providing OpenNLP-based services. + * It is a command-line application that allows configuration through command-line options and a configuration file. + * The server hosts services such as POS tagging using OpenNLP models, and can optionally enable reflection for + * gRPC clients. + * + * <p>This server listens on a configurable port (default is 7071) and hostname (default is "localhost"). + * It loads configuration settings from a file and uses them to initialize various components such as + * the POS tagger service.</p> + * + * <p><b>Command-line Options:</b> + * <ul> + * <li>{@code -p, --port}: Specifies the port on which the server should listen (default is 7071).</li> + * <li>{@code -h, --hostname}: Specifies the hostname to report (default is "localhost").</li> Review Comment: "to report" -> see (earlier) comment above? ########## opennlp-grpc/opennlp-grpc-service/pom.xml: ########## @@ -0,0 +1,150 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xmlns="http://maven.apache.org/POM/4.0.0" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + <parent> + <groupId>org.apache.opennlp</groupId> + <artifactId>opennlp-grpc</artifactId> + <version>2.5.2-SNAPSHOT</version> + </parent> + + <artifactId>opennlp-grpc-service</artifactId> + <name>Apache OpenNLP gRPC Server</name> + + <dependencies> + <dependency> + <groupId>org.apache.opennlp</groupId> + <artifactId>opennlp-grpc-api</artifactId> + <version>${project.version}</version> + </dependency> + <dependency> + <groupId>org.apache.opennlp</groupId> + <artifactId>opennlp-tools</artifactId> + <version>${opennlp.version}</version> + </dependency> + <dependency> + <groupId>org.apache.opennlp</groupId> + <artifactId>opennlp-tools-models</artifactId> + <version>${opennlp.version}</version> + </dependency> + <dependency> + <groupId>info.picocli</groupId> + <artifactId>picocli</artifactId> + <version>${picocli.version}</version> + </dependency> + + <dependency> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + <version>${slf4j.version}</version> + </dependency> + + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-slf4j2-impl</artifactId> + <version>${log4j2.version}</version> + <scope>runtime</scope> + </dependency> + + <dependency> + <groupId>org.junit.jupiter</groupId> + <artifactId>junit-jupiter-api</artifactId> + <version>${junit.version}</version> + <scope>test</scope> + </dependency> + + <dependency> + <groupId>org.junit.jupiter</groupId> + <artifactId>junit-jupiter-engine</artifactId> + <version>${junit.version}</version> + <scope>test</scope> + </dependency> + + <dependency> + <groupId>org.junit.jupiter</groupId> + <artifactId>junit-jupiter-params</artifactId> + <version>${junit.version}</version> + <scope>test</scope> + </dependency> + + <dependency> + <groupId>com.ginsberg</groupId> + <artifactId>junit5-system-exit</artifactId> + <version>${junit5-system-exit.version}</version> + <scope>test</scope> + </dependency> + + <dependency> + <groupId>org.awaitility</groupId> + <artifactId>awaitility</artifactId> + <version>4.2.2</version> Review Comment: Externalize this version number towards a property? ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/PosTaggerService.java: ########## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.nio.file.Path; +import java.util.Arrays; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; + +import com.google.rpc.Code; +import com.google.rpc.Status; +import io.grpc.protobuf.StatusProto; +import io.grpc.stub.StreamObserver; +import org.slf4j.LoggerFactory; + +import opennlp.OpenNLPService; +import opennlp.PosTaggerServiceGrpc; +import opennlp.service.classpath.DirectoryModelFinder; +import opennlp.service.exception.ServiceException; +import opennlp.tools.commons.ThreadSafe; +import opennlp.tools.models.ClassPathModel; +import opennlp.tools.models.ClassPathModelEntry; +import opennlp.tools.models.ClassPathModelLoader; +import opennlp.tools.postag.POSModel; +import opennlp.tools.postag.POSTagFormat; +import opennlp.tools.postag.POSTagger; +import opennlp.tools.postag.ThreadSafePOSTaggerME; + +/** + * The {@code PosTaggerService} class implements a gRPC service for Part-of-Speech (POS) tagging + * using Apache OpenNLP models. It extends the auto-generated gRPC base class + * {@link PosTaggerServiceGrpc.PosTaggerServiceImplBase}. + * + * <p>This service provides functionality for: + * <ul> + * <li>Retrieving available POS models loaded from the classpath.</li> + * <li>Performing POS tagging on input sentences.</li> + * <li>Performing POS tagging with additional context.</li> + * </ul> + * </p> + * + * <p><b>Configuration:</b> + * <ul> + * <li>{@code model.location}: Directory to search for models (default: "extlib").</li> + * <li>{@code model.recursive}: Whether to scan subdirectories (default: {@code true}).</li> + * <li>{@code model.pos.wildcard}: Wildcard pattern to identify POS models (default: "opennlp-models-pos-*.jar").</li> + * </ul> + * </p> + */ +@ThreadSafe +public class PosTaggerService extends PosTaggerServiceGrpc.PosTaggerServiceImplBase { + + private static final org.slf4j.Logger logger = + LoggerFactory.getLogger(PosTaggerService.class); + + private static final Map<String, ClassPathModel> MODEL_CACHE = new ConcurrentHashMap<>(); + private static final Map<String, POSTagger> TAGGER_CACHE = new ConcurrentHashMap<>(); + + public PosTaggerService(Map<String, String> conf) { Review Comment: Please leave minimal JavaDoc for the constructor to explain how `conf` map is structured or should at least contain to be valid. ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/OpenNLPServer.java: ########## @@ -0,0 +1,210 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp; + +import java.io.File; +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.Callable; + +import io.grpc.Server; +import io.grpc.ServerBuilder; +import io.grpc.protobuf.services.ProtoReflectionServiceV1; +import org.slf4j.LoggerFactory; +import picocli.CommandLine; +import picocli.CommandLine.Command; +import picocli.CommandLine.Option; + +import opennlp.service.PosTaggerService; + +/** + * The {@code OpenNLPServer} class implements a gRPC server for providing OpenNLP-based services. + * It is a command-line application that allows configuration through command-line options and a configuration file. + * The server hosts services such as POS tagging using OpenNLP models, and can optionally enable reflection for + * gRPC clients. + * + * <p>This server listens on a configurable port (default is 7071) and hostname (default is "localhost"). + * It loads configuration settings from a file and uses them to initialize various components such as Review Comment: "from a file" -> Indicate which one it is, aka what the name is? ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/PosTaggerService.java: ########## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.nio.file.Path; +import java.util.Arrays; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; + +import com.google.rpc.Code; +import com.google.rpc.Status; +import io.grpc.protobuf.StatusProto; +import io.grpc.stub.StreamObserver; +import org.slf4j.LoggerFactory; + +import opennlp.OpenNLPService; +import opennlp.PosTaggerServiceGrpc; +import opennlp.service.classpath.DirectoryModelFinder; +import opennlp.service.exception.ServiceException; +import opennlp.tools.commons.ThreadSafe; +import opennlp.tools.models.ClassPathModel; +import opennlp.tools.models.ClassPathModelEntry; +import opennlp.tools.models.ClassPathModelLoader; +import opennlp.tools.postag.POSModel; +import opennlp.tools.postag.POSTagFormat; +import opennlp.tools.postag.POSTagger; +import opennlp.tools.postag.ThreadSafePOSTaggerME; + +/** + * The {@code PosTaggerService} class implements a gRPC service for Part-of-Speech (POS) tagging + * using Apache OpenNLP models. It extends the auto-generated gRPC base class + * {@link PosTaggerServiceGrpc.PosTaggerServiceImplBase}. + * + * <p>This service provides functionality for: + * <ul> + * <li>Retrieving available POS models loaded from the classpath.</li> + * <li>Performing POS tagging on input sentences.</li> + * <li>Performing POS tagging with additional context.</li> + * </ul> + * </p> + * + * <p><b>Configuration:</b> + * <ul> + * <li>{@code model.location}: Directory to search for models (default: "extlib").</li> + * <li>{@code model.recursive}: Whether to scan subdirectories (default: {@code true}).</li> + * <li>{@code model.pos.wildcard}: Wildcard pattern to identify POS models (default: "opennlp-models-pos-*.jar").</li> + * </ul> + * </p> + */ +@ThreadSafe +public class PosTaggerService extends PosTaggerServiceGrpc.PosTaggerServiceImplBase { + + private static final org.slf4j.Logger logger = + LoggerFactory.getLogger(PosTaggerService.class); + + private static final Map<String, ClassPathModel> MODEL_CACHE = new ConcurrentHashMap<>(); + private static final Map<String, POSTagger> TAGGER_CACHE = new ConcurrentHashMap<>(); + + public PosTaggerService(Map<String, String> conf) { + + try { + initializeModelCache(conf); + } catch (IOException e) { + logger.error(e.getLocalizedMessage(), e); + throw new RuntimeException(e); + } + + } + + public static void clearCaches() { Review Comment: Pls, leave a short JavaDoc notice, in which situation caches should be evicted and/or what the (assumed) state is after the method has been invoked. ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/classpath/DirectoryModelFinder.java: ########## @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service.classpath; + +import java.io.IOException; +import java.net.JarURLConnection; +import java.net.MalformedURLException; +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Enumeration; +import java.util.List; +import java.util.Locale; +import java.util.Objects; +import java.util.jar.JarEntry; +import java.util.jar.JarFile; +import java.util.regex.Pattern; +import java.util.stream.Stream; + +import org.slf4j.LoggerFactory; + +import opennlp.tools.models.AbstractClassPathModelFinder; +import opennlp.tools.models.ClassPathModelFinder; + +/** + * The {@code DirectoryModelFinder} class is responsible for finding model files in a given directory + * on the classpath. + * + * <p>This class allows searching for models based on wildcard patterns, either in plain directory structures + * or within JAR files. The search can be performed recursively depending on the specified configuration. + * + * <p><b>Usage:</b> + * <ul> + * <li>Provide the prefix for models to be found in JAR files using the {@code jarModelPrefix} parameter.</li> + * <li>Specify the directory to search and whether to enable recursive scanning.</li> + * <li>The class supports resolving both direct file matches and entries within JAR archives.</li> + * </ul> + * + * @see AbstractClassPathModelFinder + * @see ClassPathModelFinder + */ +public class DirectoryModelFinder extends AbstractClassPathModelFinder implements ClassPathModelFinder { + + private static final org.slf4j.Logger logger = LoggerFactory.getLogger(DirectoryModelFinder.class); + + private final Path directory; + private final boolean recursive; + + /** + * Constructs a new {@code DirectoryModelFinder} instance. + * + * @param jarModelPrefix the prefix for identifying model files in JAR archives; may be {@code null}. + * If it is {@code null}, {@link ClassPathModelFinder#OPENNLP_MODEL_JAR_PREFIX} is used. + * @param directory the root directory to search for model files; must not be {@code null}. + * @param recursive {@code true} if the search should include subdirectories, {@code false} otherwise. + * @throws NullPointerException if {@code directory} is {@code null}. + */ + public DirectoryModelFinder(String jarModelPrefix, Path directory, boolean recursive) { + super(jarModelPrefix == null ? OPENNLP_MODEL_JAR_PREFIX : jarModelPrefix); + Objects.requireNonNull(directory, "Given directory must not be NULL"); + this.directory = directory; + this.recursive = recursive; + } + + /** + * {@inheritDoc} + */ + @Override + protected Object getContext() { + return null; + } + + /** + * {@inheritDoc} + */ + @Override + protected List<URI> getMatchingURIs(String wildcardPattern, Object context) { + if (wildcardPattern == null) { + return Collections.emptyList(); + } + + final boolean isWindows = isWindows(); + final List<URL> cp = getDirectoryContent(); + final List<URI> cpu = new ArrayList<>(); + final Pattern jarPattern = Pattern.compile(asRegex("*" + getJarModelPrefix())); Review Comment: Make `jarPattern` a constant? ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/classpath/DirectoryModelFinder.java: ########## @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service.classpath; + +import java.io.IOException; +import java.net.JarURLConnection; +import java.net.MalformedURLException; +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Enumeration; +import java.util.List; +import java.util.Locale; +import java.util.Objects; +import java.util.jar.JarEntry; +import java.util.jar.JarFile; +import java.util.regex.Pattern; +import java.util.stream.Stream; + +import org.slf4j.LoggerFactory; + +import opennlp.tools.models.AbstractClassPathModelFinder; +import opennlp.tools.models.ClassPathModelFinder; + +/** + * The {@code DirectoryModelFinder} class is responsible for finding model files in a given directory + * on the classpath. + * + * <p>This class allows searching for models based on wildcard patterns, either in plain directory structures + * or within JAR files. The search can be performed recursively depending on the specified configuration. + * + * <p><b>Usage:</b> + * <ul> + * <li>Provide the prefix for models to be found in JAR files using the {@code jarModelPrefix} parameter.</li> + * <li>Specify the directory to search and whether to enable recursive scanning.</li> + * <li>The class supports resolving both direct file matches and entries within JAR archives.</li> + * </ul> + * + * @see AbstractClassPathModelFinder + * @see ClassPathModelFinder + */ +public class DirectoryModelFinder extends AbstractClassPathModelFinder implements ClassPathModelFinder { Review Comment: Is this a candidate for inclusion in: `opennlp-tools-models` component? Open a Jira to transfer / include it there? ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/classpath/DirectoryModelFinder.java: ########## @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service.classpath; + +import java.io.IOException; +import java.net.JarURLConnection; +import java.net.MalformedURLException; +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Enumeration; +import java.util.List; +import java.util.Locale; +import java.util.Objects; +import java.util.jar.JarEntry; +import java.util.jar.JarFile; +import java.util.regex.Pattern; +import java.util.stream.Stream; + +import org.slf4j.LoggerFactory; + +import opennlp.tools.models.AbstractClassPathModelFinder; +import opennlp.tools.models.ClassPathModelFinder; + +/** + * The {@code DirectoryModelFinder} class is responsible for finding model files in a given directory + * on the classpath. + * + * <p>This class allows searching for models based on wildcard patterns, either in plain directory structures + * or within JAR files. The search can be performed recursively depending on the specified configuration. + * + * <p><b>Usage:</b> + * <ul> + * <li>Provide the prefix for models to be found in JAR files using the {@code jarModelPrefix} parameter.</li> + * <li>Specify the directory to search and whether to enable recursive scanning.</li> + * <li>The class supports resolving both direct file matches and entries within JAR archives.</li> + * </ul> + * + * @see AbstractClassPathModelFinder + * @see ClassPathModelFinder + */ +public class DirectoryModelFinder extends AbstractClassPathModelFinder implements ClassPathModelFinder { + + private static final org.slf4j.Logger logger = LoggerFactory.getLogger(DirectoryModelFinder.class); + + private final Path directory; + private final boolean recursive; + + /** + * Constructs a new {@code DirectoryModelFinder} instance. + * + * @param jarModelPrefix the prefix for identifying model files in JAR archives; may be {@code null}. + * If it is {@code null}, {@link ClassPathModelFinder#OPENNLP_MODEL_JAR_PREFIX} is used. + * @param directory the root directory to search for model files; must not be {@code null}. Review Comment: "search" -> "scan" ########## opennlp-grpc/opennlp-grpc-service/src/test/java/opennlp/service/PosTaggerServiceTest.java: ########## @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package opennlp.service; + +import java.net.URISyntaxException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.List; +import java.util.Map; + +import org.junit.jupiter.api.Test; + +import opennlp.OpenNLPService; +import opennlp.service.stubs.TestStreamObserver; + +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; + +public class PosTaggerServiceTest { + + private static Path getModelDirectory() throws URISyntaxException { + return Paths.get( + Thread.currentThread().getContextClassLoader() + .getResource("models/marker.txt") + .toURI() + ).getParent().toAbsolutePath(); + } + + @Test + public void testGetAvailableModels() throws URISyntaxException { + final Path modelPath = getModelDirectory(); + + final PosTaggerService taggerService = new PosTaggerService(Map.of("model.location", modelPath.toString())); + + taggerService.getAvailableModels(OpenNLPService.Empty.newBuilder().build(), new TestStreamObserver<>() { + + @Override + public void onNext(OpenNLPService.AvailableModels t) { + assertNotNull(t); + assertEquals(2, t.getModelsCount()); + } + }); + + PosTaggerService.clearCaches(); + } + + @Test + public void testGetAvailableModelsCustomPattern() throws URISyntaxException { + final Path modelPath = getModelDirectory(); + + final PosTaggerService taggerService = new PosTaggerService( + Map.of( + "model.location", modelPath.toString(), + "model.pos.wildcard", "opennlp-pos-*.jar", + "model.model.recursive", "true" + )); + + taggerService.getAvailableModels(OpenNLPService.Empty.newBuilder().build(), new TestStreamObserver<>() { + + @Override + public void onNext(OpenNLPService.AvailableModels t) { + assertNotNull(t); + assertEquals(1, t.getModelsCount()); + OpenNLPService.Model m = t.getModels(0); + assertNotNull(m); + assertEquals("opennlp-de-test2.bin", m.getName()); + } + }); + + PosTaggerService.clearCaches(); + } + + @Test + public void testGetAvailableModelsCustomPatternNotRecursive() throws URISyntaxException { + final Path modelPath = getModelDirectory(); + + final PosTaggerService taggerService = new PosTaggerService( + Map.of( + "model.location", modelPath.toString(), + "model.pos.wildcard", "opennlp-pos-*.jar", + "model.recursive", "false" + )); + + taggerService.getAvailableModels(OpenNLPService.Empty.newBuilder().build(), new TestStreamObserver<>() { + + @Override + public void onNext(OpenNLPService.AvailableModels t) { + assertNotNull(t); + assertEquals(0, t.getModelsCount()); + } + }); + + PosTaggerService.clearCaches(); + } + + @Test + public void testDoTagging() throws URISyntaxException { + final Path modelPath = getModelDirectory(); + + final PosTaggerService taggerService = new PosTaggerService( + Map.of( + "model.location", modelPath.toString(), + "model.pos.wildcard", "opennlp-models-pos-en-*.jar", + "model.recursive", "true" + )); + + final String hash = "5af913a52fa0b014e22c4c4411e146720f1222bdebde9ce1f1a3174df974d26d"; + + //check if we have the EN tagger available + taggerService.getAvailableModels(OpenNLPService.Empty.newBuilder().build(), new TestStreamObserver<>() { + + @Override + public void onNext(OpenNLPService.AvailableModels t) { + assertNotNull(t); + assertEquals(1, t.getModelsCount()); + OpenNLPService.Model m = t.getModels(0); + assertNotNull(m); + assertEquals("opennlp-en-ud-ewt-pos-1.2-2.5.0.bin", m.getName()); + assertEquals(hash, m.getHash()); + } + }); + + //simulate a tagging session + final String[] sentence = {"The", "driver", "got", "badly", "injured", "by", "the", "accident", "."}; Review Comment: The reference sentence and POS tags array can be externalized to a constant to avoid duplication. ########## opennlp-grpc/opennlp-grpc-service/src/test/java/opennlp/service/PosTaggerServiceTest.java: ########## @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package opennlp.service; + +import java.net.URISyntaxException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.List; +import java.util.Map; + +import org.junit.jupiter.api.Test; + +import opennlp.OpenNLPService; +import opennlp.service.stubs.TestStreamObserver; + +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; + +public class PosTaggerServiceTest { + + private static Path getModelDirectory() throws URISyntaxException { + return Paths.get( Review Comment: lines 40 to 44 look a bit weirdly in terms of formatting, might be improved towards better readability. ########## opennlp-grpc/opennlp-grpc-service/src/test/java/opennlp/service/PosTaggerServiceTest.java: ########## @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package opennlp.service; + +import java.net.URISyntaxException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Arrays; +import java.util.List; +import java.util.Map; + +import org.junit.jupiter.api.Test; + +import opennlp.OpenNLPService; +import opennlp.service.stubs.TestStreamObserver; + +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; + +public class PosTaggerServiceTest { + + private static Path getModelDirectory() throws URISyntaxException { + return Paths.get( + Thread.currentThread().getContextClassLoader() + .getResource("models/marker.txt") + .toURI() + ).getParent().toAbsolutePath(); + } + + @Test + public void testGetAvailableModels() throws URISyntaxException { + final Path modelPath = getModelDirectory(); + + final PosTaggerService taggerService = new PosTaggerService(Map.of("model.location", modelPath.toString())); + + taggerService.getAvailableModels(OpenNLPService.Empty.newBuilder().build(), new TestStreamObserver<>() { + + @Override + public void onNext(OpenNLPService.AvailableModels t) { + assertNotNull(t); + assertEquals(2, t.getModelsCount()); + } + }); + + PosTaggerService.clearCaches(); Review Comment: This line / call should be externalized in form of an `@AfterEach` annotated cleanup method in that test. See below for other, multiple occurrences. ########## opennlp-grpc/opennlp-grpc-service/src/main/java/opennlp/service/classpath/DirectoryModelFinder.java: ########## @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package opennlp.service.classpath; + +import java.io.IOException; +import java.net.JarURLConnection; +import java.net.MalformedURLException; +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Enumeration; +import java.util.List; +import java.util.Locale; +import java.util.Objects; +import java.util.jar.JarEntry; +import java.util.jar.JarFile; +import java.util.regex.Pattern; +import java.util.stream.Stream; + +import org.slf4j.LoggerFactory; + +import opennlp.tools.models.AbstractClassPathModelFinder; +import opennlp.tools.models.ClassPathModelFinder; + +/** + * The {@code DirectoryModelFinder} class is responsible for finding model files in a given directory + * on the classpath. + * + * <p>This class allows searching for models based on wildcard patterns, either in plain directory structures + * or within JAR files. The search can be performed recursively depending on the specified configuration. + * + * <p><b>Usage:</b> + * <ul> + * <li>Provide the prefix for models to be found in JAR files using the {@code jarModelPrefix} parameter.</li> + * <li>Specify the directory to search and whether to enable recursive scanning.</li> + * <li>The class supports resolving both direct file matches and entries within JAR archives.</li> + * </ul> + * + * @see AbstractClassPathModelFinder + * @see ClassPathModelFinder + */ +public class DirectoryModelFinder extends AbstractClassPathModelFinder implements ClassPathModelFinder { + + private static final org.slf4j.Logger logger = LoggerFactory.getLogger(DirectoryModelFinder.class); + + private final Path directory; + private final boolean recursive; + + /** + * Constructs a new {@code DirectoryModelFinder} instance. + * + * @param jarModelPrefix the prefix for identifying model files in JAR archives; may be {@code null}. + * If it is {@code null}, {@link ClassPathModelFinder#OPENNLP_MODEL_JAR_PREFIX} is used. + * @param directory the root directory to search for model files; must not be {@code null}. + * @param recursive {@code true} if the search should include subdirectories, {@code false} otherwise. + * @throws NullPointerException if {@code directory} is {@code null}. + */ + public DirectoryModelFinder(String jarModelPrefix, Path directory, boolean recursive) { + super(jarModelPrefix == null ? OPENNLP_MODEL_JAR_PREFIX : jarModelPrefix); + Objects.requireNonNull(directory, "Given directory must not be NULL"); + this.directory = directory; + this.recursive = recursive; + } + + /** + * {@inheritDoc} + */ + @Override + protected Object getContext() { + return null; + } + + /** + * {@inheritDoc} + */ + @Override + protected List<URI> getMatchingURIs(String wildcardPattern, Object context) { + if (wildcardPattern == null) { + return Collections.emptyList(); + } + + final boolean isWindows = isWindows(); + final List<URL> cp = getDirectoryContent(); + final List<URI> cpu = new ArrayList<>(); + final Pattern jarPattern = Pattern.compile(asRegex("*" + getJarModelPrefix())); + final Pattern filePattern = Pattern.compile(asRegex("*" + wildcardPattern)); Review Comment: Make `filePattern` a constant? ########## opennlp-grpc/opennlp-grpc-api/README.md: ########## @@ -0,0 +1,53 @@ +# Apache OpenNLP gRPC API + +This module contains the [gRPC](https://grpc.io) schema used in Apache OpenNLP to provide a service side gRPC backend. + +An automatically generated overview of the endpoints and messages can be found [here](opennlp) + +# Main concepts + +The endpoints and messages described by the API are meant to be a minimum. +It does not support every feature of Apache OpenNLP at the moment, but is open for enhancement or further improvement. + +# Maven dependencies + +The Java code generated from the schema is available as a Maven dependency. + +``` + <dependencies> + <dependency> + <groupId>org.apache.opennlp</groupId> + <artifactId>opennlp-grpc-api</artifactId> + <version>VERSION</version> + </dependency> + </dependencies> +``` + +# Code generation + +The Java code can be (re)generated as follows; [docker-protoc](https://github.com/namely/docker-protoc) is used to +generate the code for Java: + +```powershell +docker run -v ${PWD}:/defs namely/protoc-all -f opennlp.proto -l java -o src/main/java +``` + +Since the Java code is provided here and the corresponding JARs will be available from Maven, regenerating from the Review Comment: ..."will be available from Maven" -> "will be available via Maven", use via here. ########## pom.xml: ########## @@ -105,6 +105,7 @@ <module>opennlp-brat-annotator</module> <module>opennlp-coref</module> <module>opennlp-dl</module> + <module>opennlp-grpc</module> Review Comment: When adding this here, the upper READM.md file in the root of the sandbox needs an addition for that set of components, as everything should be basically introduced there, at least the name and a short description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@opennlp.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org