(opennlp-sandbox) branch OPENNLP-1833-grpc-expansion updated: OPENNLP-1833: Add embeddings, segmentation and semantic chunking to the gRPC server

kristian Wed, 10 Jun 2026 05:39:48 -0700

This is an automated email from the ASF dual-hosted git repository.

krickert pushed a commit to branch OPENNLP-1833-grpc-expansion
in repository https://gitbox.apache.org/repos/asf/opennlp-sandbox.git



The following commit(s) were added to refs/heads/OPENNLP-1833-grpc-expansion by 
this push:
     new cd64ff61 OPENNLP-1833: Add embeddings, segmentation and semantic 
chunking to the gRPC server
cd64ff61 is described below

commit cd64ff6171381607864dcb91ac3c9429d9d9e943
Author: Kristian Rickert <[email protected]>
AuthorDate: Wed Jun 10 08:38:57 2026 -0400

    OPENNLP-1833: Add embeddings, segmentation and semantic chunking to the 
gRPC server
    
    Adds an EmbeddingProvider abstraction with ONNX Runtime CPU and CUDA
    implementations behind a strict factory (model.embedder.backend=onnx|cuda).
    Embedding models are declared per id in the server config, loaded eagerly,
    and the dimension is read from the ONNX session metadata. The embedder uses
    the standard single-segment BERT encoding, maps OOV tokens to the unknown 
id,
    truncates at 512 wordpieces, only sends inputs the model declares, and 
closes
    all native resources deterministically.
    
    Chunking adds sentence, token-window and semantic algorithms, wired through
    chunk_embed_configs and PIPELINE_STEP_CHUNK with per-chunk embeddings and
    group statistics. Semantic chunking places boundaries on 
consecutive-sentence
    cosine similarity with percentile or fixed thresholds and min/max size
    constraints.
    
    The GPU build (-Dgpu) swaps onnxruntime for onnxruntime_gpu so the CPU and
    CUDA runtimes never coexist on the classpath. Covered by unit tests for the
    providers, factory, chunkers and analyzer paths; README updated.
---
 opennlp-grpc/README.md                             |  96 +++++++-
 opennlp-grpc/opennlp-grpc-service/pom.xml          |  60 +++++
 .../opennlp/grpc/chunk/ChunkEmbedProcessor.java    | 256 ++++++++++++++++++++
 .../opennlp/grpc/chunk/SegmentationChunker.java    | 169 +++++++++++++
 .../apache/opennlp/grpc/chunk/SemanticChunker.java | 219 +++++++++++++++++
 .../embedding/AbstractOnnxEmbeddingProvider.java   | 269 +++++++++++++++++++++
 .../grpc/embedding/CudaEmbeddingProvider.java      |  51 ++++
 .../opennlp/grpc/embedding/EmbeddingProvider.java  |  98 ++++++++
 .../grpc/embedding/EmbeddingProviderFactory.java   |  63 +++++
 .../embedding/OnnxRuntimeEmbeddingProvider.java    |  46 ++++
 .../grpc/embedding/OnnxSentenceEmbedder.java       | 211 ++++++++++++++++
 .../opennlp/grpc/model/ModelBundleCache.java       |  46 +++-
 .../grpc/processor/BasicDocumentAnalyzer.java      | 255 ++++++++++++++++---
 .../opennlp/grpc/processor/PipelineStepPolicy.java |   4 +-
 .../opennlp/grpc/server/OpenNlpGrpcServer.java     |   5 +-
 .../chunk/ChunkEmbedProcessorSemanticTest.java     |  80 ++++++
 .../grpc/chunk/SegmentationChunkerTest.java        | 104 ++++++++
 .../opennlp/grpc/chunk/SemanticChunkerTest.java    | 151 ++++++++++++
 .../embedding/EmbeddingProviderFactoryTest.java    |  64 +++++
 .../grpc/embedding/StubEmbeddingProvider.java      |  89 +++++++
 .../BasicDocumentAnalyzerChunkEmbedTest.java       | 106 ++++++++
 .../BasicDocumentAnalyzerEmbeddingTest.java        |  91 +++++++
 .../processor/BasicDocumentAnalyzerPolicyTest.java |  36 ++-
 .../BasicDocumentAnalyzerSemanticChunkTest.java    |  88 +++++++
 24 files changed, 2614 insertions(+), 43 deletions(-)

diff --git a/opennlp-grpc/README.md b/opennlp-grpc/README.md
index cae50bef..34b32d67 100644
--- a/opennlp-grpc/README.md
+++ b/opennlp-grpc/README.md
@@ -60,10 +60,98 @@ By default no configuration is required: the server loads 
the bundled English
 sentence-detector and tokenizer from the classpath.
 
 > v1 note: this minimal slice implements sentence detection, tokenization,
-> probability reporting, `max_text_length`, offset encoding selection, and the
-> default `en-basic` model bundle. Unsupported backends, ONNX embedding model
-> selection, non-default bundles, and chunk/embed configs are rejected 
explicitly
-> instead of being silently ignored.
+> sentence-level embeddings (when ONNX models are configured), segmentation 
chunking
+> (`sentence` and `token` algorithms via `chunk_embed_configs` or 
`PIPELINE_STEP_CHUNK`),
+> probability reporting, `max_text_length`, offset encoding selection, and the 
default
+> `en-basic` model bundle. Semantic chunking (`algorithm: semantic`), CPU/GPU 
ONNX
+> embeddings, and segmentation chunking are supported when models are 
configured.
+> OpenVINO, classic syntactic `ChunkerME`, non-default bundles, and per-entry 
chunk
+> profiles are rejected explicitly instead of being silently ignored.
+
+### Embedding models (optional)
+
+Register ONNX sentence-transformer models in the server config:
+
+```ini
+model.embedder.default_id=sentence-transformers
+model.embedder.sentence-transformers.onnx.path=/path/to/model.onnx
+model.embedder.sentence-transformers.vocab.path=/path/to/vocab.txt
+```
+
+Request embeddings by adding `PIPELINE_STEP_EMBED` to the analysis profile and
+setting `options.onnx_embedding_model_id` (or rely on `default_id` when only
+one model is registered). Uses ONNX Runtime via `opennlp-dl` on CPU by default.
+
+#### GPU embeddings (optional)
+
+Build with the GPU flavor, which replaces the `onnxruntime` jar with
+`onnxruntime_gpu` (exactly one of the two is ever on the classpath), and point
+the server at CUDA:
+
+```bash
+mvn -pl opennlp-grpc/opennlp-grpc-service -Dgpu package
+```
+
+```ini
+model.embedder.backend=cuda
+model.embedder.gpu_device_id=0
+model.embedder.default_id=sentence-transformers
+model.embedder.sentence-transformers.onnx.path=/path/to/model.onnx
+model.embedder.sentence-transformers.vocab.path=/path/to/vocab.txt
+```
+
+`model.embedder.backend` accepts `onnx` (default, CPU) or `cuda`; any other 
value
+is rejected at startup. `model.embedder.gpu_device_id` is only valid with the
+`cuda` backend. Clients should set `inference_backend` to 
`INFERENCE_BACKEND_CUDA`
+(or legacy `INFERENCE_BACKEND_ONNX_RUNTIME_GPU`) when requesting embeddings or
+chunk embeddings. Requires an NVIDIA CUDA runtime on the host.
+
+### Chunk + embed configs
+
+Request one or more chunking strategies with per-chunk embeddings:
+
+```json
+{
+  "chunk_embed_configs": [
+    {
+      "config_id": "sentence-chunks",
+      "chunking": { "algorithm": "sentence" },
+      "embedding_model_ids": ["sentence-transformers"]
+    },
+    {
+      "config_id": "token-chunks",
+      "chunking": { "algorithm": "token", "chunk_size": 128, "chunk_overlap": 
16 },
+      "embedding_model_ids": ["sentence-transformers"]
+    }
+  ]
+}
+```
+
+The server auto-runs sentence detection (and tokenization for `token` windows) 
once,
+then returns each strategy as a `chunk_embedding_groups` entry with embeddings
+attached inside each chunk.
+
+#### Semantic chunking
+
+Topic-boundary chunking compares consecutive sentence embeddings and splits 
when
+cosine similarity drops below `semantic_config.similarity_threshold` (default 
`0.5`)
+or below the configured `percentile_threshold`. Example:
+
+```json
+{
+  "config_id": "semantic-topics",
+  "chunking": {
+    "algorithm": "semantic",
+    "semantic_config": {
+      "similarity_threshold": 0.75,
+      "min_chunk_sentences": 1,
+      "max_chunk_sentences": 8,
+      "semantic_embedding_model_id": "sentence-transformers"
+    }
+  },
+  "embedding_model_ids": ["sentence-transformers"]
+}
+```
 
 ## v1 API
 
diff --git a/opennlp-grpc/opennlp-grpc-service/pom.xml 
b/opennlp-grpc/opennlp-grpc-service/pom.xml
index 4e2400fb..7b559fb8 100644
--- a/opennlp-grpc/opennlp-grpc-service/pom.xml
+++ b/opennlp-grpc/opennlp-grpc-service/pom.xml
@@ -30,6 +30,11 @@
     <artifactId>opennlp-grpc-service</artifactId>
     <name>Apache OpenNLP gRPC Server</name>
 
+    <properties>
+        <!-- Must match the onnxruntime version managed by opennlp-dl 
${opennlp.version}. -->
+        <onnxruntime.version>1.25.0</onnxruntime.version>
+    </properties>
+
     <dependencies>
         <dependency>
             <groupId>org.apache.opennlp</groupId>
@@ -55,6 +60,23 @@
             <groupId>org.apache.opennlp</groupId>
             <artifactId>opennlp-model-resolver</artifactId>
         </dependency>
+        <!--
+          ONNX embeddings support (Maven Central artifact, not the sandbox 
DL4J module).
+          The onnxruntime jar is excluded here and supplied by exactly one of 
the
+          cpu/gpu profiles below, so the CPU and CUDA runtimes (which ship the 
same
+          ai.onnxruntime classes) can never coexist on the classpath.
+        -->
+        <dependency>
+            <groupId>org.apache.opennlp</groupId>
+            <artifactId>opennlp-dl</artifactId>
+            <version>${opennlp.version}</version>
+            <exclusions>
+                <exclusion>
+                    <groupId>com.microsoft.onnxruntime</groupId>
+                    <artifactId>onnxruntime</artifactId>
+                </exclusion>
+            </exclusions>
+        </dependency>
 
         <dependency>
             <groupId>org.apache.opennlp</groupId>
@@ -131,6 +153,44 @@
 
     </dependencies>
 
+    <!--
+      Exactly one ONNX Runtime flavor is active at a time. The default cpu 
profile
+      is replaced by the gpu profile when building with -Dgpu (mirrors the
+      opennlp-dl / opennlp-dl-gpu split in the main OpenNLP repository).
+    -->
+    <profiles>
+        <profile>
+            <id>cpu</id>
+            <activation>
+                <property>
+                    <name>!gpu</name>
+                </property>
+            </activation>
+            <dependencies>
+                <dependency>
+                    <groupId>com.microsoft.onnxruntime</groupId>
+                    <artifactId>onnxruntime</artifactId>
+                    <version>${onnxruntime.version}</version>
+                </dependency>
+            </dependencies>
+        </profile>
+        <profile>
+            <id>gpu</id>
+            <activation>
+                <property>
+                    <name>gpu</name>
+                </property>
+            </activation>
+            <dependencies>
+                <dependency>
+                    <groupId>com.microsoft.onnxruntime</groupId>
+                    <artifactId>onnxruntime_gpu</artifactId>
+                    <version>${onnxruntime.version}</version>
+                </dependency>
+            </dependencies>
+        </profile>
+    </profiles>
+
     <build>
         <plugins>
             <plugin>
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/ChunkEmbedProcessor.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/ChunkEmbedProcessor.java
new file mode 100644
index 00000000..74140bac
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/ChunkEmbedProcessor.java
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.chunk;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.opennlp.grpc.embedding.EmbeddingProvider;
+import org.apache.opennlp.grpc.processor.AnalysisException;
+import org.apache.opennlp.grpc.v1.AnnotatedSentence;
+import org.apache.opennlp.grpc.v1.AnnotationSpan;
+import org.apache.opennlp.grpc.v1.Chunk;
+import org.apache.opennlp.grpc.v1.ChunkEmbedConfigEntry;
+import org.apache.opennlp.grpc.v1.ChunkEmbeddingGroup;
+import org.apache.opennlp.grpc.v1.ChunkGroupStats;
+import org.apache.opennlp.grpc.v1.ChunkingSpec;
+import org.apache.opennlp.grpc.v1.CoordinateSpace;
+import org.apache.opennlp.grpc.v1.DiagnosticSeverity;
+import org.apache.opennlp.grpc.v1.EmbeddingGranularity;
+import org.apache.opennlp.grpc.v1.EmbeddingResult;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.PipelineStep;
+import org.apache.opennlp.grpc.v1.ProcessingDiagnostic;
+
+/**
+ * Builds {@link ChunkEmbeddingGroup} results from {@link 
ChunkEmbedConfigEntry} requests.
+ */
+public final class ChunkEmbedProcessor {
+
+  private ChunkEmbedProcessor() {
+  }
+
+  /**
+   * Validates a chunk+embed config entry against the server's capabilities 
before any
+   * processing starts, so invalid requests fail without partial results.
+   *
+   * @param entry             The config entry to validate.
+   * @param embeddingProvider The provider whose registered models are checked.
+   *
+   * @throws AnalysisException If the entry is incomplete, references unknown 
embedding
+   *                           models, or requires features this server does 
not provide.
+   */
+  public static void validateEntry(ChunkEmbedConfigEntry entry, 
EmbeddingProvider embeddingProvider) {
+    if (entry.getConfigId().isBlank()) {
+      throw AnalysisException.invalidArgument("chunk_embed_configs.config_id 
is required");
+    }
+    if (entry.hasProfile()) {
+      throw AnalysisException.unimplemented(
+          "per-entry analysis profiles in chunk_embed_configs are not 
implemented");
+    }
+    if (!entry.hasChunking()) {
+      throw AnalysisException.invalidArgument(
+          "chunk_embed_configs.chunking is required for config '" + 
entry.getConfigId() + "'");
+    }
+    final ChunkingSpec chunking = entry.getChunking();
+    if (isSemantic(chunking)) {
+      validateSemanticChunking(entry);
+      if (!embeddingProvider.isAvailable()) {
+        throw AnalysisException.notFound(
+            "semantic chunking for config '" + entry.getConfigId()
+                + "' requires configured embedding models on this server");
+      }
+    }
+    if (entry.getEmbeddingModelIdsCount() > 0 && 
!embeddingProvider.isAvailable()) {
+      throw AnalysisException.notFound(
+          "embedding models requested for config '" + entry.getConfigId()
+              + "' but no embedding models are configured on this server");
+    }
+    for (String modelId : entry.getEmbeddingModelIdsList()) {
+      if (!embeddingProvider.supportsModel(modelId)) {
+        throw AnalysisException.notFound("Unknown embedding model '" + modelId 
+ "'");
+      }
+    }
+  }
+
+  /**
+   * Chunks the document according to the entry's chunking spec and embeds 
every chunk
+   * with each requested embedding model.
+   *
+   * @param rawText           The document text the annotation offsets refer 
to.
+   * @param document          The analyzed document backbone.
+   * @param entry             A previously validated config entry.
+   * @param embeddingProvider The provider used for chunk embeddings and 
semantic chunking.
+   *
+   * @return The resulting chunk group including per-group statistics.
+   */
+  public static ChunkEmbeddingGroup buildGroup(
+      String rawText,
+      OpenNlpDocument document,
+      ChunkEmbedConfigEntry entry,
+      EmbeddingProvider embeddingProvider) {
+    final long started = System.currentTimeMillis();
+    final List<SegmentationChunker.ChunkSegment> segments =
+        SegmentationChunker.segment(rawText, document, entry.getChunking(), 
embeddingProvider);
+
+    final ChunkEmbeddingGroup.Builder group = ChunkEmbeddingGroup.newBuilder()
+        .setGroupId(entry.getConfigId())
+        .setChunkConfigId(entry.getConfigId())
+        .addAllEmbeddingModelIds(entry.getEmbeddingModelIdsList())
+        
.setGranularity(EmbeddingGranularity.EMBEDDING_GRANULARITY_CHUNK_LEVEL);
+    if (entry.hasResultSetName()) {
+      group.setResultSetName(entry.getResultSetName());
+    }
+
+    int totalTokens = 0;
+    for (SegmentationChunker.ChunkSegment segment : segments) {
+      final String chunkText = rawText.substring(segment.start(), 
segment.end());
+      final Chunk.Builder chunk = Chunk.newBuilder()
+          .setAnnotationSpan(toSpan(segment.start(), segment.end()))
+          .setTextContent(chunkText)
+          .addAllContainedSentenceIndices(segment.sentenceIndices());
+      totalTokens += countTokens(document, segment);
+      for (String modelId : entry.getEmbeddingModelIdsList()) {
+        final float[] vector = embeddingProvider.embed(modelId, chunkText);
+        chunk.addEmbeddings(EmbeddingResult.newBuilder()
+            .setModelId(modelId)
+            .addAllVector(toFloatList(vector))
+            .setSourceSpan(toSpan(segment.start(), segment.end()))
+            
.setGranularity(EmbeddingGranularity.EMBEDDING_GRANULARITY_CHUNK_LEVEL)
+            .build());
+      }
+      group.addChunks(chunk.build());
+    }
+
+    group.setStats(ChunkGroupStats.newBuilder()
+        .setChunkCount(segments.size())
+        .setTotalTokens(totalTokens)
+        .setProcessingTimeMs(System.currentTimeMillis() - started)
+        .build());
+    return group.build();
+  }
+
+  /**
+   * Builds a sentence-per-chunk group without embeddings, used when the 
{@code CHUNK}
+   * pipeline step runs without chunk+embed configs.
+   *
+   * @param rawText  The document text the annotation offsets refer to.
+   * @param document The analyzed document backbone.
+   * @param groupId  The id assigned to the resulting group.
+   *
+   * @return The resulting chunk group.
+   */
+  public static ChunkEmbeddingGroup buildSentenceGroup(
+      String rawText, OpenNlpDocument document, String groupId) {
+    final ChunkingSpec spec = 
ChunkingSpec.newBuilder().setAlgorithm("sentence").build();
+    final ChunkEmbedConfigEntry entry = ChunkEmbedConfigEntry.newBuilder()
+        .setConfigId(groupId)
+        .setChunking(spec)
+        .build();
+    return buildGroup(rawText, document, entry, new NoOpEmbeddingProvider());
+  }
+
+  /**
+   * @param configId   The config id the diagnostic refers to.
+   * @param chunkCount The number of chunks produced for the config.
+   *
+   * @return An INFO diagnostic for a successfully processed chunk config.
+   */
+  public static ProcessingDiagnostic successDiagnostic(String configId, int 
chunkCount) {
+    return ProcessingDiagnostic.newBuilder()
+        .setStep(PipelineStep.PIPELINE_STEP_CHUNK)
+        .setSeverity(DiagnosticSeverity.DIAGNOSTIC_SEVERITY_INFO)
+        .setMessage("Produced " + chunkCount + " chunk(s) for config '" + 
configId + "'")
+        .build();
+  }
+
+  private static void validateSemanticChunking(ChunkEmbedConfigEntry entry) {
+    final var semantic = entry.getChunking().getSemanticConfig();
+    if (semantic.hasSemanticEmbeddingModelId() && 
!semantic.getSemanticEmbeddingModelId().isBlank()) {
+      return;
+    }
+    if (entry.getEmbeddingModelIdsCount() == 1) {
+      return;
+    }
+    throw AnalysisException.invalidArgument(
+        "semantic chunking requires semantic_embedding_model_id or exactly one 
embedding_model_id");
+  }
+
+  private static boolean isSemantic(ChunkingSpec chunking) {
+    return "semantic".equals(chunking.getAlgorithm()) || 
chunking.hasSemanticConfig();
+  }
+
+  private static int countTokens(OpenNlpDocument document, 
SegmentationChunker.ChunkSegment segment) {
+    int count = 0;
+    for (int sentenceIndex : segment.sentenceIndices()) {
+      final AnnotatedSentence sentence = document.getSentences(sentenceIndex);
+      for (var token : sentence.getTokensList()) {
+        final AnnotationSpan span = token.getAnnotationSpan();
+        if (span.getStart() < segment.end() && span.getEnd() > 
segment.start()) {
+          count++;
+        }
+      }
+    }
+    return count;
+  }
+
+  private static AnnotationSpan toSpan(int start, int end) {
+    return AnnotationSpan.newBuilder()
+        .setStart(start)
+        .setEnd(end)
+        .setSpace(CoordinateSpace.COORDINATE_SPACE_CHAR_DOCUMENT)
+        .build();
+  }
+
+  private static List<Float> toFloatList(float[] vector) {
+    final List<Float> values = new ArrayList<>(vector.length);
+    for (float value : vector) {
+      values.add(value);
+    }
+    return values;
+  }
+
+  /** Embedding provider that rejects embed calls; used for chunk-only groups. 
*/
+  private static final class NoOpEmbeddingProvider implements 
EmbeddingProvider {
+    @Override
+    public boolean isAvailable() {
+      return false;
+    }
+
+    @Override
+    public Set<String> registeredModelIds() {
+      return Set.of();
+    }
+
+    @Override
+    public boolean supportsModel(String modelId) {
+      return false;
+    }
+
+    @Override
+    public int embeddingDimension(String modelId) {
+      return 0;
+    }
+
+    @Override
+    public float[] embed(String modelId, String text) {
+      throw AnalysisException.failedPrecondition("embeddings were not 
requested for this group");
+    }
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/SegmentationChunker.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/SegmentationChunker.java
new file mode 100644
index 00000000..55d9e42b
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/SegmentationChunker.java
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.chunk;
+
+import java.util.ArrayList;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.opennlp.grpc.embedding.EmbeddingProvider;
+import org.apache.opennlp.grpc.processor.AnalysisException;
+import org.apache.opennlp.grpc.v1.AnnotationSpan;
+import org.apache.opennlp.grpc.v1.ChunkingSpec;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.Token;
+
+/**
+ * RAG-style segmentation chunking over an analyzed document backbone.
+ *
+ * <p>Supported algorithms are {@code sentence} (one chunk per sentence), 
{@code token}
+ * (overlapping token windows) and {@code semantic} (topic boundaries from 
sentence
+ * embedding similarity, delegated to {@link SemanticChunker}).</p>
+ */
+public final class SegmentationChunker {
+
+  /** Exclusive-end document character offsets plus the sentences touched by 
the chunk. */
+  public record ChunkSegment(int start, int end, List<Integer> 
sentenceIndices) {
+  }
+
+  private SegmentationChunker() {
+  }
+
+  /**
+   * Segments an analyzed document according to the given chunking spec.
+   *
+   * @param rawText           The document text the annotation offsets refer 
to.
+   * @param document          The analyzed document. Sentence spans are 
required; token
+   *                          spans are additionally required for the {@code 
token} algorithm.
+   * @param spec              The chunking spec. The algorithm must be set.
+   * @param embeddingProvider The provider used for semantic chunking. Must 
not be
+   *                          {@code null}; it is not consulted for other 
algorithms.
+   *
+   * @return The chunk segments in document order. Never {@code null}.
+   *
+   * @throws AnalysisException If the spec is invalid or names an unknown 
algorithm.
+   */
+  public static List<ChunkSegment> segment(
+      String rawText,
+      OpenNlpDocument document,
+      ChunkingSpec spec,
+      EmbeddingProvider embeddingProvider) {
+    if (spec.getAlgorithm().isBlank()) {
+      throw AnalysisException.invalidArgument("chunking.algorithm is 
required");
+    }
+    if (isSemantic(spec)) {
+      final String modelId = requireSemanticModelId(spec, embeddingProvider);
+      return SemanticChunker.chunk(
+          rawText, document, spec.getSemanticConfig(), embeddingProvider, 
modelId);
+    }
+    return switch (spec.getAlgorithm()) {
+      case "sentence" -> sentenceChunks(document);
+      case "token" -> tokenWindowChunks(document, spec);
+      default -> throw AnalysisException.unimplemented(
+          "chunking algorithm '" + spec.getAlgorithm() + "' is not 
implemented");
+    };
+  }
+
+  private static boolean isSemantic(ChunkingSpec spec) {
+    return "semantic".equals(spec.getAlgorithm()) || spec.hasSemanticConfig();
+  }
+
+  private static String requireSemanticModelId(ChunkingSpec spec, 
EmbeddingProvider provider) {
+    if (!spec.hasSemanticConfig()) {
+      throw AnalysisException.invalidArgument("chunking.semantic_config is 
required for semantic chunking");
+    }
+    final var semantic = spec.getSemanticConfig();
+    final String requested = semantic.hasSemanticEmbeddingModelId()
+        ? semantic.getSemanticEmbeddingModelId() : null;
+    final String modelId = provider.resolveModelId(requested);
+    if (modelId == null || modelId.isBlank()) {
+      throw AnalysisException.invalidArgument(
+          "semantic chunking requires semantic_embedding_model_id or exactly 
one registered embedding model");
+    }
+    if (!provider.supportsModel(modelId)) {
+      throw AnalysisException.notFound("Unknown semantic embedding model '" + 
modelId + "'");
+    }
+    return modelId;
+  }
+
+  private static List<ChunkSegment> sentenceChunks(OpenNlpDocument document) {
+    final List<ChunkSegment> chunks = new ArrayList<>();
+    for (int i = 0; i < document.getSentencesCount(); i++) {
+      final AnnotationSpan span = document.getSentences(i).getSentenceSpan();
+      chunks.add(new ChunkSegment(span.getStart(), span.getEnd(), List.of(i)));
+    }
+    return chunks;
+  }
+
+  private static List<ChunkSegment> tokenWindowChunks(OpenNlpDocument 
document, ChunkingSpec spec) {
+    final int chunkSize = spec.getChunkSize();
+    final int chunkOverlap = spec.getChunkOverlap();
+    if (chunkSize <= 0) {
+      throw AnalysisException.invalidArgument("chunking.chunk_size must be 
positive for token windows");
+    }
+    if (chunkOverlap < 0 || chunkOverlap >= chunkSize) {
+      throw AnalysisException.invalidArgument(
+          "chunking.chunk_overlap must be >= 0 and < chunk_size");
+    }
+
+    final List<FlatToken> flatTokens = flattenTokens(document);
+    if (flatTokens.isEmpty()) {
+      return List.of();
+    }
+
+    final int step = Math.max(1, chunkSize - chunkOverlap);
+    final List<ChunkSegment> chunks = new ArrayList<>();
+    for (int startToken = 0; startToken < flatTokens.size(); startToken += 
step) {
+      final int endToken = Math.min(startToken + chunkSize, flatTokens.size()) 
- 1;
+      final FlatToken first = flatTokens.get(startToken);
+      final FlatToken last = flatTokens.get(endToken);
+      chunks.add(new ChunkSegment(
+          first.start(),
+          last.end(),
+          sentenceIndices(flatTokens, startToken, endToken)));
+      if (endToken == flatTokens.size() - 1) {
+        break;
+      }
+    }
+    return chunks;
+  }
+
+  private static List<FlatToken> flattenTokens(OpenNlpDocument document) {
+    final List<FlatToken> tokens = new ArrayList<>();
+    for (int sentenceIndex = 0; sentenceIndex < document.getSentencesCount(); 
sentenceIndex++) {
+      for (Token token : document.getSentences(sentenceIndex).getTokensList()) 
{
+        final AnnotationSpan span = token.getAnnotationSpan();
+        tokens.add(new FlatToken(span.getStart(), span.getEnd(), 
sentenceIndex));
+      }
+    }
+    return tokens;
+  }
+
+  private static List<Integer> sentenceIndices(
+      List<FlatToken> flatTokens, int startToken, int endToken) {
+    final Set<Integer> indices = new LinkedHashSet<>();
+    for (int i = startToken; i <= endToken; i++) {
+      indices.add(flatTokens.get(i).sentenceIndex());
+    }
+    return List.copyOf(indices);
+  }
+
+  private record FlatToken(int start, int end, int sentenceIndex) {
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/SemanticChunker.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/SemanticChunker.java
new file mode 100644
index 00000000..8f2fa8c9
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/chunk/SemanticChunker.java
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.chunk;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.opennlp.grpc.embedding.EmbeddingProvider;
+import org.apache.opennlp.grpc.processor.AnalysisException;
+import org.apache.opennlp.grpc.v1.AnnotatedSentence;
+import org.apache.opennlp.grpc.v1.AnnotationSpan;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.SemanticChunkingConfig;
+
+/**
+ * Topic-boundary chunking using consecutive sentence embedding similarity.
+ *
+ * <p>Every sentence is embedded individually and a chunk boundary is placed 
wherever the
+ * cosine similarity of two consecutive sentences falls below the threshold. 
The threshold
+ * is, in order of precedence, the {@code percentile_threshold} over the 
observed
+ * similarities, the explicit {@code similarity_threshold}, or
+ * {@value #DEFAULT_SIMILARITY_THRESHOLD}.</p>
+ *
+ * <p>Size constraints are applied after boundary detection: chunks smaller 
than
+ * {@code min_chunk_sentences} are merged first, then chunks larger than
+ * {@code max_chunk_sentences} are split. The maximum therefore always holds, 
while the
+ * minimum may be violated by a split remainder.</p>
+ */
+public final class SemanticChunker {
+
+  static final float DEFAULT_SIMILARITY_THRESHOLD = 0.5f;
+
+  private SemanticChunker() {
+  }
+
+  /**
+   * Chunks the analyzed document at semantic topic boundaries.
+   *
+   * @param rawText           The document text the sentence spans refer to.
+   * @param document          The analyzed document. Sentence spans are 
required.
+   * @param config            The semantic chunking configuration.
+   * @param embeddingProvider The provider used to embed each sentence.
+   * @param modelId           The id of a registered embedding model.
+   *
+   * @return The chunk segments in document order. Never {@code null}.
+   *
+   * @throws AnalysisException If the configuration is invalid or embedding 
fails.
+   */
+  public static List<SegmentationChunker.ChunkSegment> chunk(
+      String rawText,
+      OpenNlpDocument document,
+      SemanticChunkingConfig config,
+      EmbeddingProvider embeddingProvider,
+      String modelId) {
+    if (document.getSentencesCount() == 0) {
+      return List.of();
+    }
+    if (document.getSentencesCount() == 1) {
+      final AnnotationSpan span = document.getSentences(0).getSentenceSpan();
+      return List.of(new SegmentationChunker.ChunkSegment(span.getStart(), 
span.getEnd(), List.of(0)));
+    }
+
+    final int sentenceCount = document.getSentencesCount();
+    final float[][] embeddings = new float[sentenceCount][];
+    for (int i = 0; i < sentenceCount; i++) {
+      final AnnotationSpan span = document.getSentences(i).getSentenceSpan();
+      final String sentenceText = rawText.substring(span.getStart(), 
span.getEnd());
+      embeddings[i] = embeddingProvider.embed(modelId, sentenceText);
+    }
+
+    final float[] similarities = new float[sentenceCount - 1];
+    for (int i = 0; i < similarities.length; i++) {
+      similarities[i] = cosineSimilarity(embeddings[i], embeddings[i + 1]);
+    }
+
+    final float threshold = resolveThreshold(config, similarities);
+    final int minSentences = config.getMinChunkSentences() > 0 ? 
config.getMinChunkSentences() : 1;
+    final int maxSentences =
+        config.getMaxChunkSentences() > 0 ? config.getMaxChunkSentences() : 
Integer.MAX_VALUE;
+
+    final List<Integer> starts = new ArrayList<>();
+    starts.add(0);
+    for (int i = 0; i < similarities.length; i++) {
+      if (similarities[i] < threshold) {
+        starts.add(i + 1);
+      }
+    }
+
+    mergeSmallChunks(starts, minSentences, sentenceCount);
+    splitLargeChunks(starts, maxSentences, sentenceCount);
+
+    final List<SegmentationChunker.ChunkSegment> chunks = new ArrayList<>();
+    for (int i = 0; i < starts.size(); i++) {
+      final int startSentence = starts.get(i);
+      final int endSentence = i + 1 < starts.size() ? starts.get(i + 1) - 1 : 
sentenceCount - 1;
+      chunks.add(toSegment(rawText, document, startSentence, endSentence));
+    }
+    return chunks;
+  }
+
+  private static float resolveThreshold(SemanticChunkingConfig config, float[] 
similarities) {
+    if (config.getPercentileThreshold() > 0) {
+      if (config.getPercentileThreshold() >= 100) {
+        throw 
AnalysisException.invalidArgument("semantic_config.percentile_threshold must be 
< 100");
+      }
+      return percentile(similarities, config.getPercentileThreshold());
+    }
+    if (config.getSimilarityThreshold() > 0f) {
+      return config.getSimilarityThreshold();
+    }
+    return DEFAULT_SIMILARITY_THRESHOLD;
+  }
+
+  private static float percentile(float[] values, int percentile) {
+    final float[] sorted = values.clone();
+    Arrays.sort(sorted);
+    final int index = Math.max(0, Math.min(sorted.length - 1,
+        (int) Math.ceil(percentile / 100.0 * sorted.length) - 1));
+    return sorted[index];
+  }
+
+  /**
+   * Merges chunks smaller than {@code minSentences} into a neighbour. An 
undersized chunk
+   * absorbs the following chunk; an undersized final chunk is absorbed by the 
preceding
+   * one. A single chunk covering the whole document is never merged away, so 
documents
+   * with fewer than {@code minSentences} sentences yield one chunk.
+   */
+  private static void mergeSmallChunks(List<Integer> starts, int minSentences, 
int sentenceCount) {
+    if (minSentences <= 1) {
+      return;
+    }
+    int index = 0;
+    while (index < starts.size()) {
+      final int chunkStart = starts.get(index);
+      final int chunkEnd = index + 1 < starts.size() ? starts.get(index + 1) - 
1 : sentenceCount - 1;
+      if (chunkEnd - chunkStart + 1 >= minSentences) {
+        index++;
+      } else if (index + 1 < starts.size()) {
+        // Absorb the following chunk, then re-check the grown chunk at the 
same index.
+        starts.remove(index + 1);
+      } else if (index > 0) {
+        // Undersized final chunk: absorb it into the preceding chunk.
+        starts.remove(index);
+      } else {
+        break;
+      }
+    }
+  }
+
+  /**
+   * Splits chunks larger than {@code maxSentences} into consecutive windows 
of at most
+   * {@code maxSentences} sentences.
+   */
+  private static void splitLargeChunks(List<Integer> starts, int maxSentences, 
int sentenceCount) {
+    int index = 0;
+    while (index < starts.size()) {
+      final int chunkStart = starts.get(index);
+      final int chunkEnd = index + 1 < starts.size() ? starts.get(index + 1) - 
1 : sentenceCount - 1;
+      final int size = chunkEnd - chunkStart + 1;
+      if (size <= maxSentences) {
+        index++;
+        continue;
+      }
+      int splitAt = chunkStart + maxSentences;
+      starts.add(index + 1, splitAt);
+      index++;
+    }
+  }
+
+  private static SegmentationChunker.ChunkSegment toSegment(
+      String rawText,
+      OpenNlpDocument document,
+      int startSentence,
+      int endSentence) {
+    final AnnotatedSentence first = document.getSentences(startSentence);
+    final AnnotatedSentence last = document.getSentences(endSentence);
+    final int start = first.getSentenceSpan().getStart();
+    final int end = last.getSentenceSpan().getEnd();
+    final List<Integer> sentenceIndices = new ArrayList<>();
+    for (int i = startSentence; i <= endSentence; i++) {
+      sentenceIndices.add(i);
+    }
+    return new SegmentationChunker.ChunkSegment(start, end, 
List.copyOf(sentenceIndices));
+  }
+
+  static float cosineSimilarity(float[] left, float[] right) {
+    if (left.length != right.length) {
+      throw AnalysisException.invalidArgument("Embedding dimension mismatch 
during semantic chunking");
+    }
+    double dot = 0;
+    double leftNorm = 0;
+    double rightNorm = 0;
+    for (int i = 0; i < left.length; i++) {
+      dot += left[i] * right[i];
+      leftNorm += left[i] * left[i];
+      rightNorm += right[i] * right[i];
+    }
+    if (leftNorm == 0 || rightNorm == 0) {
+      return 0f;
+    }
+    return (float) (dot / (Math.sqrt(leftNorm) * Math.sqrt(rightNorm)));
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/AbstractOnnxEmbeddingProvider.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/AbstractOnnxEmbeddingProvider.java
new file mode 100644
index 00000000..13fca998
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/AbstractOnnxEmbeddingProvider.java
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+
+import ai.onnxruntime.OrtException;
+import org.apache.opennlp.grpc.processor.AnalysisException;
+import org.apache.opennlp.grpc.v1.InferenceBackend;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Base class for ONNX Runtime backed {@link EmbeddingProvider} 
implementations.
+ *
+ * <p>Embedding models are declared in the server configuration with one ONNX 
path and
+ * one vocabulary path per model id:</p>
+ *
+ * <pre>
+ * model.embedder.&lt;model-id&gt;.onnx.path=/models/minilm.onnx
+ * model.embedder.&lt;model-id&gt;.vocab.path=/models/minilm-vocab.txt
+ * model.embedder.default_id=&lt;model-id&gt;          (optional, required 
with multiple models)
+ * model.embedder.gpu_device_id=&lt;ordinal&gt;        (CUDA backends only)
+ * </pre>
+ *
+ * <p>All configured models are loaded eagerly so that misconfiguration fails 
at server
+ * startup rather than on the first request. Subclasses only declare which
+ * {@link InferenceBackend} values they serve.</p>
+ */
+abstract class AbstractOnnxEmbeddingProvider implements EmbeddingProvider, 
AutoCloseable {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(AbstractOnnxEmbeddingProvider.class);
+
+  private static final String KEY_PREFIX = "model.embedder.";
+  private static final String KEY_ONNX_SUFFIX = ".onnx.path";
+  private static final String KEY_VOCAB_SUFFIX = ".vocab.path";
+  private static final String KEY_DEFAULT_ID = "model.embedder.default_id";
+  private static final String KEY_GPU_DEVICE = "model.embedder.gpu_device_id";
+
+  private final Map<String, OnnxSentenceEmbedder> models;
+  private final String defaultModelId;
+
+  /**
+   * Loads all configured embedding models.
+   *
+   * @param configuration The server configuration. Must not be {@code null}.
+   * @param useCuda       Whether models run on the CUDA execution provider.
+   *
+   * @throws AnalysisException If the configuration is inconsistent, a 
referenced file is
+   *                           missing, or a model fails to load.
+   */
+  AbstractOnnxEmbeddingProvider(Map<String, String> configuration, boolean 
useCuda) {
+    Objects.requireNonNull(configuration, "configuration must not be null");
+    final int gpuDeviceId = gpuDeviceId(configuration, useCuda);
+    this.models = loadModels(configuration, useCuda, gpuDeviceId);
+    this.defaultModelId = resolveDefaultModelId(configuration, models);
+  }
+
+  /**
+   * @return The {@link InferenceBackend} values this provider serves, in 
addition to
+   *         {@code UNSPECIFIED} and {@code OPENNLP_ME} which every provider 
accepts.
+   */
+  abstract Set<InferenceBackend> supportedBackends();
+
+  @Override
+  public boolean isAvailable() {
+    return !models.isEmpty();
+  }
+
+  @Override
+  public Set<String> registeredModelIds() {
+    return models.keySet();
+  }
+
+  @Override
+  public boolean supportsModel(String modelId) {
+    return modelId != null && !modelId.isBlank() && 
models.containsKey(modelId);
+  }
+
+  @Override
+  public int embeddingDimension(String modelId) {
+    return requireModel(modelId).embeddingDimension();
+  }
+
+  @Override
+  public float[] embed(String modelId, String text) {
+    Objects.requireNonNull(text, "text must not be null");
+    final OnnxSentenceEmbedder embedder = requireModel(modelId);
+    try {
+      return embedder.embed(text);
+    } catch (OrtException e) {
+      throw AnalysisException.internal("Embedding inference failed for model 
'" + modelId + "'", e);
+    }
+  }
+
+  @Override
+  public String resolveModelId(String requestedModelId) {
+    if (requestedModelId != null && !requestedModelId.isBlank()) {
+      return requestedModelId;
+    }
+    if (defaultModelId != null) {
+      return defaultModelId;
+    }
+    return models.size() == 1 ? models.keySet().iterator().next() : null;
+  }
+
+  @Override
+  public boolean supportsInferenceBackend(InferenceBackend backend) {
+    return backend == InferenceBackend.INFERENCE_BACKEND_UNSPECIFIED
+        || backend == InferenceBackend.INFERENCE_BACKEND_OPENNLP_ME
+        || supportedBackends().contains(backend);
+  }
+
+  /**
+   * Closes all loaded ONNX sessions. Failures are logged and do not abort the 
shutdown
+   * of the remaining models.
+   */
+  @Override
+  public void close() {
+    for (Map.Entry<String, OnnxSentenceEmbedder> entry : models.entrySet()) {
+      try {
+        entry.getValue().close();
+      } catch (OrtException e) {
+        logger.warn("Failed to close embedding model '{}'", entry.getKey(), e);
+      }
+    }
+  }
+
+  private OnnxSentenceEmbedder requireModel(String modelId) {
+    if (modelId == null || modelId.isBlank()) {
+      throw AnalysisException.invalidArgument("embedding model id is 
required");
+    }
+    final OnnxSentenceEmbedder embedder = models.get(modelId);
+    if (embedder == null) {
+      throw AnalysisException.notFound("Unknown embedding model '" + modelId + 
"'");
+    }
+    return embedder;
+  }
+
+  private static int gpuDeviceId(Map<String, String> configuration, boolean 
useCuda) {
+    final String configured = configuration.get(KEY_GPU_DEVICE);
+    if (configured == null || configured.isBlank()) {
+      return 0;
+    }
+    if (!useCuda) {
+      throw AnalysisException.invalidArgument(
+          KEY_GPU_DEVICE + " requires model.embedder.backend=cuda");
+    }
+    try {
+      return Integer.parseInt(configured.trim());
+    } catch (NumberFormatException e) {
+      throw AnalysisException.invalidArgument(
+          KEY_GPU_DEVICE + " must be an integer: " + configured);
+    }
+  }
+
+  private static Map<String, OnnxSentenceEmbedder> loadModels(
+      Map<String, String> configuration, boolean useCuda, int gpuDeviceId) {
+    final Map<String, String> onnxPaths = new HashMap<>();
+    final Map<String, String> vocabPaths = new HashMap<>();
+
+    for (Map.Entry<String, String> entry : configuration.entrySet()) {
+      final String key = entry.getKey();
+      if (!key.startsWith(KEY_PREFIX) || key.equals(KEY_DEFAULT_ID) || 
key.equals(KEY_GPU_DEVICE)) {
+        continue;
+      }
+      final String suffix;
+      if (key.endsWith(KEY_ONNX_SUFFIX)) {
+        suffix = KEY_ONNX_SUFFIX;
+      } else if (key.endsWith(KEY_VOCAB_SUFFIX)) {
+        suffix = KEY_VOCAB_SUFFIX;
+      } else {
+        continue;
+      }
+      final String modelId = key.substring(KEY_PREFIX.length(), key.length() - 
suffix.length());
+      final String path = entry.getValue();
+      if (modelId.isBlank() || path == null || path.isBlank()) {
+        continue;
+      }
+      if (suffix.equals(KEY_ONNX_SUFFIX)) {
+        onnxPaths.put(modelId, path);
+      } else {
+        vocabPaths.put(modelId, path);
+      }
+    }
+
+    final Map<String, OnnxSentenceEmbedder> loaded = new HashMap<>();
+    try {
+      for (Map.Entry<String, String> entry : onnxPaths.entrySet()) {
+        final String modelId = entry.getKey();
+        final String vocabPath = vocabPaths.get(modelId);
+        if (vocabPath == null) {
+          throw AnalysisException.invalidArgument(
+              KEY_PREFIX + modelId + KEY_VOCAB_SUFFIX
+                  + " is required when an ONNX path is configured");
+        }
+        loaded.put(modelId, loadModel(modelId, entry.getValue(), vocabPath, 
useCuda, gpuDeviceId));
+      }
+    } catch (RuntimeException e) {
+      for (OnnxSentenceEmbedder embedder : loaded.values()) {
+        try {
+          embedder.close();
+        } catch (OrtException closeFailure) {
+          e.addSuppressed(closeFailure);
+        }
+      }
+      throw e;
+    }
+    return Map.copyOf(loaded);
+  }
+
+  private static OnnxSentenceEmbedder loadModel(
+      String modelId, String onnxPath, String vocabPath, boolean useCuda, int 
gpuDeviceId) {
+    final File onnxFile = new File(onnxPath);
+    final File vocabFile = new File(vocabPath);
+    if (!onnxFile.isFile()) {
+      throw AnalysisException.notFound(
+          "ONNX embedding model file not found for '" + modelId + "': " + 
onnxFile.getAbsolutePath());
+    }
+    if (!vocabFile.isFile()) {
+      throw AnalysisException.notFound(
+          "Embedding vocabulary file not found for '" + modelId + "': " + 
vocabFile.getAbsolutePath());
+    }
+    try {
+      final OnnxSentenceEmbedder embedder =
+          new OnnxSentenceEmbedder(onnxFile, vocabFile, useCuda, gpuDeviceId);
+      logger.info("Loaded embedding model '{}' (dimension={}, backend={})",
+          modelId, embedder.embeddingDimension(), useCuda ? "CUDA" : "ONNX 
Runtime CPU");
+      return embedder;
+    } catch (OrtException | IOException e) {
+      final String backend = useCuda ? "CUDA" : "ONNX Runtime CPU";
+      throw AnalysisException.internal(
+          "Failed to load embedding model '" + modelId + "' on " + backend, e);
+    }
+  }
+
+  private static String resolveDefaultModelId(
+      Map<String, String> configuration, Map<String, OnnxSentenceEmbedder> 
models) {
+    final String configured = configuration.get(KEY_DEFAULT_ID);
+    if (configured != null && !configured.isBlank()) {
+      if (!models.containsKey(configured)) {
+        throw AnalysisException.notFound(
+            KEY_DEFAULT_ID + " '" + configured + "' is not registered");
+      }
+      return configured;
+    }
+    return models.size() == 1 ? models.keySet().iterator().next() : null;
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/CudaEmbeddingProvider.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/CudaEmbeddingProvider.java
new file mode 100644
index 00000000..204db27c
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/CudaEmbeddingProvider.java
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.opennlp.grpc.v1.InferenceBackend;
+
+/**
+ * ONNX Runtime embedding provider running on the CUDA execution provider.
+ *
+ * <p>Serves {@code INFERENCE_BACKEND_CUDA} and {@code 
INFERENCE_BACKEND_ONNX_RUNTIME_GPU}
+ * requests. Requires a server built with the {@code gpu} Maven profile, which 
replaces
+ * the {@code onnxruntime} jar with {@code onnxruntime_gpu}, and a CUDA 
capable device at
+ * runtime. The device is selected with {@code model.embedder.gpu_device_id}. 
See
+ * {@link AbstractOnnxEmbeddingProvider} for the model configuration keys.</p>
+ */
+public final class CudaEmbeddingProvider extends AbstractOnnxEmbeddingProvider 
{
+
+  /**
+   * Loads all configured embedding models on the CUDA device.
+   *
+   * @param configuration The server configuration. Must not be {@code null}.
+   */
+  public CudaEmbeddingProvider(Map<String, String> configuration) {
+    super(configuration, true);
+  }
+
+  @Override
+  Set<InferenceBackend> supportedBackends() {
+    return Set.of(
+        InferenceBackend.INFERENCE_BACKEND_CUDA,
+        InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME_GPU);
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/EmbeddingProvider.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/EmbeddingProvider.java
new file mode 100644
index 00000000..9e750191
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/EmbeddingProvider.java
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.util.Set;
+
+import org.apache.opennlp.grpc.v1.InferenceBackend;
+
+/**
+ * Local embedding backend for the {@code PIPELINE_STEP_EMBED} pipeline step.
+ *
+ * <p>Implementations own their model lifecycle: models are registered at 
construction
+ * time and identified by a stable model id. Implementations that hold native 
resources
+ * should also implement {@link AutoCloseable}; the server closes such 
providers on
+ * shutdown.</p>
+ */
+public interface EmbeddingProvider {
+
+  /**
+   * @return {@code true} when at least one embedding model is registered.
+   */
+  boolean isAvailable();
+
+  /**
+   * @return The ids of all registered embedding models. Never {@code null}.
+   */
+  Set<String> registeredModelIds();
+
+  /**
+   * @param modelId The model id to check. May be {@code null} or blank.
+   *
+   * @return {@code true} when the given id refers to a registered embedding 
model.
+   */
+  boolean supportsModel(String modelId);
+
+  /**
+   * @param modelId The id of a registered embedding model.
+   *
+   * @return The dimension of the vectors produced by the model.
+   */
+  int embeddingDimension(String modelId);
+
+  /**
+   * Embeds the given text.
+   *
+   * @param modelId The id of a registered embedding model.
+   * @param text    The text to embed. Must not be {@code null}.
+   *
+   * @return The embedding vector of length {@link 
#embeddingDimension(String)}.
+   */
+  float[] embed(String modelId, String text);
+
+  /**
+   * Resolves the effective model id from an optional client override.
+   *
+   * @param requestedModelId The model id requested by the client. May be 
{@code null}
+   *                         or blank when the client wants the server default.
+   *
+   * @return The model id to use, or {@code null} when no default can be 
determined.
+   */
+  default String resolveModelId(String requestedModelId) {
+    if (requestedModelId != null && !requestedModelId.isBlank()) {
+      return requestedModelId;
+    }
+    if (registeredModelIds().size() == 1) {
+      return registeredModelIds().iterator().next();
+    }
+    return null;
+  }
+
+  /**
+   * @param backend The inference backend requested by the client.
+   *
+   * @return {@code true} when the provider can serve the requested inference 
backend.
+   *         {@code UNSPECIFIED} and {@code OPENNLP_ME} are always accepted 
because they
+   *         do not constrain the embedding backend.
+   */
+  default boolean supportsInferenceBackend(InferenceBackend backend) {
+    return backend == InferenceBackend.INFERENCE_BACKEND_UNSPECIFIED
+        || backend == InferenceBackend.INFERENCE_BACKEND_OPENNLP_ME
+        || backend == InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME;
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/EmbeddingProviderFactory.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/EmbeddingProviderFactory.java
new file mode 100644
index 00000000..98fe9195
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/EmbeddingProviderFactory.java
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.util.Locale;
+import java.util.Map;
+
+import org.apache.opennlp.grpc.processor.AnalysisException;
+
+/**
+ * Creates the configured {@link EmbeddingProvider} for the gRPC server.
+ *
+ * <p>The backend is selected with the {@code model.embedder.backend} 
configuration key.
+ * Supported values are {@value #BACKEND_ONNX} (the default, ONNX Runtime on 
CPU) and
+ * {@value #BACKEND_CUDA} (ONNX Runtime with the CUDA execution provider; 
requires a
+ * server built with the {@code gpu} Maven profile). Any other value is 
rejected.</p>
+ */
+public final class EmbeddingProviderFactory {
+
+  static final String KEY_BACKEND = "model.embedder.backend";
+  static final String BACKEND_ONNX = "onnx";
+  static final String BACKEND_CUDA = "cuda";
+
+  private EmbeddingProviderFactory() {
+  }
+
+  /**
+   * Creates the embedding provider declared by the server configuration.
+   *
+   * @param configuration The server configuration. Must not be {@code null}.
+   *
+   * @return The configured provider. Never {@code null}.
+   *
+   * @throws AnalysisException If the configured backend is unknown or the 
provider's
+   *                           model configuration is invalid.
+   */
+  public static EmbeddingProvider create(Map<String, String> configuration) {
+    final String backend =
+        configuration.getOrDefault(KEY_BACKEND, 
BACKEND_ONNX).trim().toLowerCase(Locale.ROOT);
+    return switch (backend) {
+      case BACKEND_ONNX -> new OnnxRuntimeEmbeddingProvider(configuration);
+      case BACKEND_CUDA -> new CudaEmbeddingProvider(configuration);
+      default -> throw AnalysisException.invalidArgument(
+          KEY_BACKEND + " '" + backend + "' is not supported; expected one of: 
"
+              + BACKEND_ONNX + ", " + BACKEND_CUDA);
+    };
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/OnnxRuntimeEmbeddingProvider.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/OnnxRuntimeEmbeddingProvider.java
new file mode 100644
index 00000000..ce2f6375
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/OnnxRuntimeEmbeddingProvider.java
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.opennlp.grpc.v1.InferenceBackend;
+
+/**
+ * ONNX Runtime embedding provider running on the CPU execution provider.
+ *
+ * <p>Serves {@code INFERENCE_BACKEND_ONNX_RUNTIME} requests. See
+ * {@link AbstractOnnxEmbeddingProvider} for the model configuration keys.</p>
+ */
+public final class OnnxRuntimeEmbeddingProvider extends 
AbstractOnnxEmbeddingProvider {
+
+  /**
+   * Loads all configured embedding models on the CPU.
+   *
+   * @param configuration The server configuration. Must not be {@code null}.
+   */
+  public OnnxRuntimeEmbeddingProvider(Map<String, String> configuration) {
+    super(configuration, false);
+  }
+
+  @Override
+  Set<InferenceBackend> supportedBackends() {
+    return Set.of(InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME);
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/OnnxSentenceEmbedder.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/OnnxSentenceEmbedder.java
new file mode 100644
index 00000000..f6710539
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/embedding/OnnxSentenceEmbedder.java
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.LongBuffer;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+import ai.onnxruntime.NodeInfo;
+import ai.onnxruntime.OnnxTensor;
+import ai.onnxruntime.OrtEnvironment;
+import ai.onnxruntime.OrtException;
+import ai.onnxruntime.OrtSession;
+import ai.onnxruntime.TensorInfo;
+import opennlp.dl.AbstractDL;
+import opennlp.tools.tokenize.WordpieceTokenizer;
+
+/**
+ * Computes sentence embeddings with a BERT-style ONNX model and a wordpiece 
vocabulary.
+ *
+ * <p>This embedder is the inference core behind {@link 
AbstractOnnxEmbeddingProvider}. It
+ * reuses the vocabulary loading and wordpiece tokenizer selection of {@link 
AbstractDL}
+ * (BERT or RoBERTa special tokens, chosen from the vocabulary contents) and 
adds the
+ * pieces {@code opennlp-dl}'s {@code SentenceVectorsDL} does not offer: an 
optional CUDA
+ * execution provider, session-metadata based dimension discovery and 
deterministic native
+ * resource management.</p>
+ *
+ * <p>Model input conventions follow the standard single-segment BERT encoding:
+ * {@code attention_mask} is {@code 1} for every real token and {@code 
token_type_ids}
+ * is {@code 0} throughout. Inputs the model does not declare (many 
sentence-transformers
+ * exports omit {@code token_type_ids}) are not sent. The embedding is the 
hidden state of
+ * the leading classification token ({@code [CLS]} / {@code <s>}).</p>
+ *
+ * <p>Token sequences are truncated to {@link #MAX_SEQUENCE_TOKENS} wordpieces 
(the
+ * trailing separator token is preserved) so that inputs never exceed the 
positional range
+ * of BERT-style encoders.</p>
+ */
+final class OnnxSentenceEmbedder extends AbstractDL {
+
+  /** Maximum wordpiece sequence length accepted by BERT-style encoders. */
+  static final int MAX_SEQUENCE_TOKENS = 512;
+
+  private final Set<String> declaredInputs;
+  private final long unknownTokenId;
+  private final int embeddingDimension;
+
+  /**
+   * Loads the ONNX model and vocabulary and prepares an inference session.
+   *
+   * @param model       The ONNX model file. Must exist.
+   * @param vocabulary  The wordpiece vocabulary file matching the model. Must 
exist.
+   * @param useCuda     Whether to register the CUDA execution provider.
+   * @param gpuDeviceId The CUDA device ordinal; ignored when {@code useCuda} 
is {@code false}.
+   *
+   * @throws OrtException If the ONNX session cannot be created or the model 
does not
+   *                      declare a static embedding dimension.
+   * @throws IOException  If the vocabulary cannot be read or lacks the 
special tokens
+   *                      required by the wordpiece tokenizer.
+   */
+  OnnxSentenceEmbedder(File model, File vocabulary, boolean useCuda, int 
gpuDeviceId)
+      throws OrtException, IOException {
+    env = OrtEnvironment.getEnvironment();
+    try (OrtSession.SessionOptions sessionOptions = new 
OrtSession.SessionOptions()) {
+      if (useCuda) {
+        sessionOptions.addCUDA(gpuDeviceId);
+      }
+      session = env.createSession(model.getPath(), sessionOptions);
+    }
+    try {
+      vocab = loadVocab(vocabulary);
+      tokenizer = createTokenizer(vocab);
+      unknownTokenId = requireSpecialTokens(vocab);
+      declaredInputs = Set.copyOf(session.getInputNames());
+      embeddingDimension = readEmbeddingDimension(session, model);
+    } catch (OrtException | IOException | RuntimeException e) {
+      try {
+        session.close();
+      } catch (OrtException closeFailure) {
+        e.addSuppressed(closeFailure);
+      }
+      throw e;
+    }
+  }
+
+  /**
+   * @return The embedding dimension declared by the model's output metadata.
+   */
+  int embeddingDimension() {
+    return embeddingDimension;
+  }
+
+  /**
+   * Embeds the given text.
+   *
+   * @param text The text to embed. Must not be {@code null}.
+   *
+   * @return The embedding vector of length {@link #embeddingDimension()}.
+   *
+   * @throws OrtException If inference fails.
+   */
+  float[] embed(String text) throws OrtException {
+    final long[] ids = tokenIds(text);
+    final long[] mask = new long[ids.length];
+    Arrays.fill(mask, 1);
+    final long[] types = new long[ids.length];
+    final long[] shape = {1, ids.length};
+
+    final Map<String, OnnxTensor> inputs = new HashMap<>();
+    try {
+      inputs.put(INPUT_IDS, OnnxTensor.createTensor(env, LongBuffer.wrap(ids), 
shape));
+      if (declaredInputs.contains(ATTENTION_MASK)) {
+        inputs.put(ATTENTION_MASK, OnnxTensor.createTensor(env, 
LongBuffer.wrap(mask), shape));
+      }
+      if (declaredInputs.contains(TOKEN_TYPE_IDS)) {
+        inputs.put(TOKEN_TYPE_IDS, OnnxTensor.createTensor(env, 
LongBuffer.wrap(types), shape));
+      }
+      try (OrtSession.Result result = session.run(inputs)) {
+        // getValue() copies the tensor into Java arrays, so the result can be 
closed safely.
+        final float[][][] hiddenStates = (float[][][]) 
result.get(0).getValue();
+        return hiddenStates[0][0];
+      }
+    } finally {
+      inputs.values().forEach(OnnxTensor::close);
+    }
+  }
+
+  /**
+   * Closes the inference session. The shared {@link OrtEnvironment} singleton 
is left
+   * open intentionally because other models may still be using it.
+   */
+  @Override
+  public void close() throws OrtException {
+    session.close();
+  }
+
+  private long[] tokenIds(String text) {
+    String[] tokens = tokenizer.tokenize(text);
+    if (tokens.length > MAX_SEQUENCE_TOKENS) {
+      final String separator = tokens[tokens.length - 1];
+      tokens = Arrays.copyOf(tokens, MAX_SEQUENCE_TOKENS);
+      tokens[MAX_SEQUENCE_TOKENS - 1] = separator;
+    }
+    final long[] ids = new long[tokens.length];
+    for (int i = 0; i < tokens.length; i++) {
+      final Integer id = vocab.get(tokens[i]);
+      ids[i] = id != null ? id : unknownTokenId;
+    }
+    return ids;
+  }
+
+  /**
+   * Verifies that the special tokens selected by {@link 
AbstractDL#createTokenizer(Map)}
+   * are present in the vocabulary, so that every tokenizer output can be 
mapped to an id.
+   *
+   * @return The id of the unknown token.
+   */
+  private static long requireSpecialTokens(Map<String, Integer> vocab) throws 
IOException {
+    final boolean roberta = 
vocab.containsKey(WordpieceTokenizer.ROBERTA_CLS_TOKEN);
+    final String cls = roberta
+        ? WordpieceTokenizer.ROBERTA_CLS_TOKEN : 
WordpieceTokenizer.BERT_CLS_TOKEN;
+    final String sep = roberta
+        ? WordpieceTokenizer.ROBERTA_SEP_TOKEN : 
WordpieceTokenizer.BERT_SEP_TOKEN;
+    final String unk = roberta
+        ? WordpieceTokenizer.ROBERTA_UNK_TOKEN : 
WordpieceTokenizer.BERT_UNK_TOKEN;
+    for (String token : new String[] {cls, sep, unk}) {
+      if (!vocab.containsKey(token)) {
+        throw new IOException("Embedding vocabulary does not define the 
special token '"
+            + token + "'; the vocabulary file does not match the model");
+      }
+    }
+    return vocab.get(unk);
+  }
+
+  /**
+   * Reads the embedding dimension from the last axis of the model's first 
output tensor.
+   */
+  private static int readEmbeddingDimension(OrtSession session, File model) 
throws OrtException {
+    final NodeInfo output = session.getOutputInfo().values().iterator().next();
+    if (!(output.getInfo() instanceof TensorInfo tensorInfo)) {
+      throw new OrtException("Embedding model output '" + output.getName()
+          + "' of " + model.getName() + " is not a tensor");
+    }
+    final long[] shape = tensorInfo.getShape();
+    final long dimension = shape.length > 0 ? shape[shape.length - 1] : -1;
+    if (dimension <= 0 || dimension > Integer.MAX_VALUE) {
+      throw new OrtException("Embedding model " + model.getName()
+          + " does not declare a static embedding dimension (output shape: "
+          + Arrays.toString(shape) + ")");
+    }
+    return (int) dimension;
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/model/ModelBundleCache.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/model/ModelBundleCache.java
index 0b48e1f7..64169ab9 100644
--- 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/model/ModelBundleCache.java
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/model/ModelBundleCache.java
@@ -33,12 +33,16 @@ import opennlp.tools.sentdetect.SentenceDetectorME;
 import opennlp.tools.sentdetect.SentenceModel;
 import opennlp.tools.tokenize.TokenizerME;
 import opennlp.tools.tokenize.TokenizerModel;
+import org.apache.opennlp.grpc.embedding.EmbeddingProvider;
+import org.apache.opennlp.grpc.embedding.EmbeddingProviderFactory;
 import org.apache.opennlp.grpc.profile.ProfileRegistry;
 import org.apache.opennlp.grpc.processor.AnalysisException;
 import org.apache.opennlp.grpc.v1.ComponentType;
 import org.apache.opennlp.grpc.v1.ModelBundleInfo;
 import org.apache.opennlp.grpc.v1.ModelDescriptor;
 import org.apache.opennlp.grpc.v1.PipelineStep;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Loads shared thread-safe {@code *ME} singletons once at startup.
@@ -48,6 +52,8 @@ import org.apache.opennlp.grpc.v1.PipelineStep;
  */
 public final class ModelBundleCache {
 
+  private static final Logger logger = 
LoggerFactory.getLogger(ModelBundleCache.class);
+
   private static final String DEFAULT_LANGUAGE = "en";
   private static final String KEY_SENTDETECT_PATH = 
"model.sentence_detector.path";
   private static final String KEY_TOKENIZER_PATH = "model.tokenizer.path";
@@ -56,12 +62,14 @@ public final class ModelBundleCache {
   private final Map<String, ModelBundleInfo> bundles;
   private final SentenceDetectorME sentenceDetector;
   private final TokenizerME tokenizer;
+  private final EmbeddingProvider embeddingProvider;
 
   public ModelBundleCache(Map<String, String> configuration) {
     Objects.requireNonNull(configuration, "configuration");
     this.modelProvider = new DefaultClassPathModelProvider();
     this.sentenceDetector = loadSentenceDetector(configuration);
     this.tokenizer = loadTokenizer(configuration);
+    this.embeddingProvider = EmbeddingProviderFactory.create(configuration);
     this.bundles = buildBundleCatalog();
   }
 
@@ -77,6 +85,24 @@ public final class ModelBundleCache {
     return new ArrayList<>(bundles.values());
   }
 
+  public EmbeddingProvider getEmbeddingProvider() {
+    return embeddingProvider;
+  }
+
+  /**
+   * Releases resources held by the embedding provider. Failures are logged so 
that the
+   * remaining server shutdown is not interrupted.
+   */
+  public void close() {
+    if (embeddingProvider instanceof AutoCloseable closeable) {
+      try {
+        closeable.close();
+      } catch (Exception e) {
+        logger.warn("Failed to close embedding provider", e);
+      }
+    }
+  }
+
   private SentenceDetectorME loadSentenceDetector(Map<String, String> 
configuration) {
     try {
       final String configuredPath = configuration.get(KEY_SENTDETECT_PATH);
@@ -120,8 +146,7 @@ public final class ModelBundleCache {
   }
 
   private Map<String, ModelBundleInfo> buildBundleCatalog() {
-    final Map<String, ModelBundleInfo> catalog = new HashMap<>();
-    catalog.put(ProfileRegistry.DEFAULT_BUNDLE_ID, ModelBundleInfo.newBuilder()
+    final ModelBundleInfo.Builder bundle = ModelBundleInfo.newBuilder()
         .setBundleId(ProfileRegistry.DEFAULT_BUNDLE_ID)
         .addSupportedLanguages(DEFAULT_LANGUAGE)
         .addSupportedSteps(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT)
@@ -137,8 +162,21 @@ public final class ModelBundleCache {
             .setLocale(DEFAULT_LANGUAGE)
             .setComponentType(ComponentType.COMPONENT_TYPE_TOKENIZER)
             .addLanguages(DEFAULT_LANGUAGE)
-            .build())
-        .build());
+            .build());
+    if (embeddingProvider.isAvailable()) {
+      bundle.addSupportedSteps(PipelineStep.PIPELINE_STEP_EMBED);
+      for (String modelId : embeddingProvider.registeredModelIds()) {
+        bundle.addModels(ModelDescriptor.newBuilder()
+            .setName(modelId)
+            .setLocale(DEFAULT_LANGUAGE)
+            .setComponentType(ComponentType.COMPONENT_TYPE_EMBEDDER)
+            .addLanguages(DEFAULT_LANGUAGE)
+            
.setEmbeddingDimension(embeddingProvider.embeddingDimension(modelId))
+            .build());
+      }
+    }
+    final Map<String, ModelBundleInfo> catalog = new HashMap<>();
+    catalog.put(ProfileRegistry.DEFAULT_BUNDLE_ID, bundle.build());
     return catalog;
   }
 }
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzer.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzer.java
index 2df3f089..a957dcdb 100644
--- 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzer.java
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzer.java
@@ -18,13 +18,17 @@
 package org.apache.opennlp.grpc.processor;
 
 import java.util.ArrayList;
+import java.util.LinkedHashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Objects;
+import java.util.Set;
 
 import opennlp.tools.sentdetect.SentenceDetectorME;
 import opennlp.tools.tokenize.TokenizerME;
 import opennlp.tools.util.Span;
+import org.apache.opennlp.grpc.chunk.ChunkEmbedProcessor;
+import org.apache.opennlp.grpc.embedding.EmbeddingProvider;
 import org.apache.opennlp.grpc.model.ModelBundleCache;
 import org.apache.opennlp.grpc.profile.ProfileRegistry;
 import org.apache.opennlp.grpc.profile.ProfileResolver;
@@ -34,16 +38,19 @@ import org.apache.opennlp.grpc.v1.AnalyzeDocumentRequest;
 import org.apache.opennlp.grpc.v1.AnalyzeDocumentResponse;
 import org.apache.opennlp.grpc.v1.AnnotatedSentence;
 import org.apache.opennlp.grpc.v1.AnnotationSpan;
+import org.apache.opennlp.grpc.v1.Chunk;
 import org.apache.opennlp.grpc.v1.ChunkEmbedConfigEntry;
+import org.apache.opennlp.grpc.v1.ChunkEmbeddingGroup;
 import org.apache.opennlp.grpc.v1.CoordinateSpace;
 import org.apache.opennlp.grpc.v1.DiagnosticSeverity;
+import org.apache.opennlp.grpc.v1.EmbeddingGranularity;
+import org.apache.opennlp.grpc.v1.EmbeddingResult;
 import org.apache.opennlp.grpc.v1.InferenceBackend;
 import org.apache.opennlp.grpc.v1.ModelBundleRef;
 import org.apache.opennlp.grpc.v1.OffsetEncoding;
 import org.apache.opennlp.grpc.v1.OpenNlpDocument;
 import org.apache.opennlp.grpc.v1.PipelineStep;
 import org.apache.opennlp.grpc.v1.ProcessingDiagnostic;
-import org.apache.opennlp.grpc.v1.SemanticChunkingConfig;
 import org.apache.opennlp.grpc.v1.Token;
 
 /**
@@ -56,16 +63,26 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
 
   private final ProfileResolver profileResolver;
   private final ModelBundleCache modelBundleCache;
+  private final EmbeddingProvider embeddingProvider;
 
   public BasicDocumentAnalyzer(Map<String, String> configuration) {
     this(ProfileRegistry.createDefault(), new ModelBundleCache(configuration));
   }
 
   public BasicDocumentAnalyzer(ProfileRegistry profileRegistry, 
ModelBundleCache modelBundleCache) {
+    this(profileRegistry, modelBundleCache, 
modelBundleCache.getEmbeddingProvider());
+  }
+
+  public BasicDocumentAnalyzer(
+      ProfileRegistry profileRegistry,
+      ModelBundleCache modelBundleCache,
+      EmbeddingProvider embeddingProvider) {
     Objects.requireNonNull(profileRegistry, "profileRegistry");
     Objects.requireNonNull(modelBundleCache, "modelBundleCache");
+    Objects.requireNonNull(embeddingProvider, "embeddingProvider");
     this.profileResolver = new ProfileResolver(profileRegistry);
     this.modelBundleCache = modelBundleCache;
+    this.embeddingProvider = embeddingProvider;
   }
 
   @Override
@@ -95,7 +112,7 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
       document.setMetadata(input.getMetadata());
     }
 
-    if (PipelineStepPolicy.shouldRun(profile, 
PipelineStep.PIPELINE_STEP_SENTENCE_DETECT)) {
+    if (shouldRunStep(request, profile, 
PipelineStep.PIPELINE_STEP_SENTENCE_DETECT)) {
       runStep(
           PipelineStep.PIPELINE_STEP_SENTENCE_DETECT,
           diagnostics,
@@ -104,7 +121,7 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
       addSkippedDiagnostic(diagnostics, 
PipelineStep.PIPELINE_STEP_SENTENCE_DETECT);
     }
 
-    if (PipelineStepPolicy.shouldRun(profile, 
PipelineStep.PIPELINE_STEP_TOKENIZE)) {
+    if (shouldRunStep(request, profile, PipelineStep.PIPELINE_STEP_TOKENIZE)) {
       if (document.getSentencesCount() == 0) {
         throw AnalysisException.failedPrecondition(
             PipelineStep.PIPELINE_STEP_TOKENIZE.name()
@@ -119,6 +136,36 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
       addSkippedDiagnostic(diagnostics, PipelineStep.PIPELINE_STEP_TOKENIZE);
     }
 
+    final String embeddingModelId = resolveEmbeddingModelId(request, profile);
+    if (PipelineStepPolicy.shouldRun(profile, 
PipelineStep.PIPELINE_STEP_EMBED)) {
+      if (document.getSentencesCount() == 0) {
+        throw AnalysisException.failedPrecondition(
+            PipelineStep.PIPELINE_STEP_EMBED.name()
+                + " requires "
+                + PipelineStep.PIPELINE_STEP_SENTENCE_DETECT.name());
+      }
+      runStep(
+          PipelineStep.PIPELINE_STEP_EMBED,
+          diagnostics,
+          () -> runEmbedding(rawText, document, embeddingModelId, 
diagnostics));
+    } else {
+      addSkippedDiagnostic(diagnostics, PipelineStep.PIPELINE_STEP_EMBED);
+    }
+
+    if (request.getChunkEmbedConfigsCount() > 0) {
+      runStep(
+          PipelineStep.PIPELINE_STEP_CHUNK,
+          diagnostics,
+          () -> runChunkEmbedConfigs(rawText, document, request, diagnostics));
+    } else if (shouldRunStep(request, profile, 
PipelineStep.PIPELINE_STEP_CHUNK)) {
+      runStep(
+          PipelineStep.PIPELINE_STEP_CHUNK,
+          diagnostics,
+          () -> runProfileChunking(rawText, document, diagnostics));
+    } else {
+      addSkippedDiagnostic(diagnostics, PipelineStep.PIPELINE_STEP_CHUNK);
+    }
+
     final OffsetEncoding requestedEncoding = request.hasOptions()
         ? request.getOptions().getOffsetEncoding()
         : OffsetEncoding.OFFSET_ENCODING_UNSPECIFIED;
@@ -130,7 +177,7 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
         .build();
   }
 
-  private static void validateSupportedRequest(
+  private void validateSupportedRequest(
       AnalyzeDocumentRequest request, AnalysisProfile profile, String rawText) 
{
     for (PipelineStep step : profile.getStepsList()) {
       if (step == PipelineStep.PIPELINE_STEP_UNSPECIFIED) {
@@ -141,32 +188,71 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
       }
     }
 
-    validateOptions(request, rawText);
+    validateOptions(request, profile, rawText);
     validateModelBundle(profile);
+    validateEmbeddingRequest(request, profile);
+    validateChunkEmbedConfigs(request);
+  }
 
+  private void validateChunkEmbedConfigs(AnalyzeDocumentRequest request) {
     if (request.getChunkEmbedConfigsCount() == 0) {
       return;
     }
     for (ChunkEmbedConfigEntry entry : request.getChunkEmbedConfigsList()) {
-      validateSemanticChunking(entry);
+      ChunkEmbedProcessor.validateEntry(entry, embeddingProvider);
+    }
+  }
+
+  private Set<PipelineStep> resolveEffectiveSteps(
+      AnalyzeDocumentRequest request, AnalysisProfile profile) {
+    final LinkedHashSet<PipelineStep> steps = new 
LinkedHashSet<>(profile.getStepsList());
+    if (PipelineStepPolicy.shouldRun(profile, 
PipelineStep.PIPELINE_STEP_EMBED)) {
+      steps.add(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT);
     }
-    throw AnalysisException.unimplemented("chunk_embed_configs are not 
implemented on this server");
+    if (request.getChunkEmbedConfigsCount() > 0) {
+      steps.add(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT);
+      for (ChunkEmbedConfigEntry entry : request.getChunkEmbedConfigsList()) {
+        if (entry.hasChunking() && 
"token".equals(entry.getChunking().getAlgorithm())) {
+          steps.add(PipelineStep.PIPELINE_STEP_TOKENIZE);
+        }
+      }
+    }
+    if (PipelineStepPolicy.shouldRun(profile, PipelineStep.PIPELINE_STEP_CHUNK)
+        && request.getChunkEmbedConfigsCount() == 0) {
+      steps.add(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT);
+    }
+    return steps;
+  }
+
+  private boolean shouldRunStep(
+      AnalyzeDocumentRequest request, AnalysisProfile profile, PipelineStep 
step) {
+    return resolveEffectiveSteps(request, profile).contains(step);
   }
 
-  private static void validateOptions(AnalyzeDocumentRequest request, String 
rawText) {
+  private void validateOptions(
+      AnalyzeDocumentRequest request, AnalysisProfile profile, String rawText) 
{
     if (!request.hasOptions()) {
       return;
     }
     final AnalysisOptions options = request.getOptions();
     final InferenceBackend backend = options.getInferenceBackend();
+    final boolean embedRequested =
+        PipelineStepPolicy.shouldRun(profile, 
PipelineStep.PIPELINE_STEP_EMBED);
+    final boolean chunkEmbedsRequested = 
request.getChunkEmbedConfigsList().stream()
+        .anyMatch(entry -> entry.getEmbeddingModelIdsCount() > 0);
+    final boolean dlRequested = embedRequested || chunkEmbedsRequested;
     if (backend != InferenceBackend.INFERENCE_BACKEND_UNSPECIFIED
-        && backend != InferenceBackend.INFERENCE_BACKEND_OPENNLP_ME) {
+        && backend != InferenceBackend.INFERENCE_BACKEND_OPENNLP_ME
+        && !(dlRequested && 
embeddingProvider.supportsInferenceBackend(backend))) {
       throw AnalysisException.unimplemented(
-          "inference_backend " + backend.name() + " is not implemented; only 
OPENNLP_ME is supported");
+          "inference_backend " + backend.name()
+              + " is not implemented for the configured embedding provider");
     }
     if (options.hasOnnxEmbeddingModelId() && 
!options.getOnnxEmbeddingModelId().isBlank()) {
-      throw AnalysisException.unimplemented(
-          "onnx_embedding_model_id is not implemented (no EMBED step on this 
server)");
+      if (!embedRequested) {
+        throw AnalysisException.invalidArgument(
+            "onnx_embedding_model_id requires PIPELINE_STEP_EMBED in the 
analysis profile");
+      }
     }
     if (options.hasMaxTextLength()
         && options.getMaxTextLength() > 0
@@ -176,6 +262,81 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
     }
   }
 
+  private void validateEmbeddingRequest(AnalyzeDocumentRequest request, 
AnalysisProfile profile) {
+    if (!PipelineStepPolicy.shouldRun(profile, 
PipelineStep.PIPELINE_STEP_EMBED)) {
+      return;
+    }
+    if (!embeddingProvider.isAvailable()) {
+      throw AnalysisException.notFound(
+          "PIPELINE_STEP_EMBED requested but no embedding models are 
configured on this server");
+    }
+    final String modelId = resolveEmbeddingModelId(request, profile);
+    if (modelId == null || modelId.isBlank()) {
+      throw AnalysisException.invalidArgument(
+          "onnx_embedding_model_id is required when multiple embedding models 
are configured");
+    }
+    if (!embeddingProvider.supportsModel(modelId)) {
+      throw AnalysisException.notFound("Unknown embedding model '" + modelId + 
"'");
+    }
+  }
+
+  private String resolveEmbeddingModelId(AnalyzeDocumentRequest request, 
AnalysisProfile profile) {
+    if (!PipelineStepPolicy.shouldRun(profile, 
PipelineStep.PIPELINE_STEP_EMBED)) {
+      return null;
+    }
+    String requested = null;
+    if (request.hasOptions() && 
request.getOptions().hasOnnxEmbeddingModelId()) {
+      requested = request.getOptions().getOnnxEmbeddingModelId();
+    }
+    return embeddingProvider.resolveModelId(requested);
+  }
+
+  private void runChunkEmbedConfigs(
+      String rawText,
+      OpenNlpDocument.Builder document,
+      AnalyzeDocumentRequest request,
+      List<ProcessingDiagnostic> diagnostics) {
+    if (document.getSentencesCount() == 0) {
+      throw AnalysisException.failedPrecondition(
+          "chunk_embed_configs requires sentence detection backbone");
+    }
+    for (ChunkEmbedConfigEntry entry : request.getChunkEmbedConfigsList()) {
+      if ("token".equals(entry.getChunking().getAlgorithm())) {
+        ensureTokenized(document);
+      }
+      final ChunkEmbeddingGroup group =
+          ChunkEmbedProcessor.buildGroup(rawText, document.build(), entry, 
embeddingProvider);
+      document.addChunkEmbeddingGroups(group);
+      diagnostics.add(ChunkEmbedProcessor.successDiagnostic(
+          entry.getConfigId(), group.getChunksCount()));
+    }
+  }
+
+  private void runProfileChunking(
+      String rawText,
+      OpenNlpDocument.Builder document,
+      List<ProcessingDiagnostic> diagnostics) {
+    if (document.getSentencesCount() == 0) {
+      throw AnalysisException.failedPrecondition(
+          PipelineStep.PIPELINE_STEP_CHUNK.name()
+              + " requires "
+              + PipelineStep.PIPELINE_STEP_SENTENCE_DETECT.name());
+    }
+    final ChunkEmbeddingGroup group =
+        ChunkEmbedProcessor.buildSentenceGroup(rawText, document.build(), 
"profile-chunk");
+    document.addChunkEmbeddingGroups(group);
+    diagnostics.add(ChunkEmbedProcessor.successDiagnostic("profile-chunk", 
group.getChunksCount()));
+  }
+
+  private static void ensureTokenized(OpenNlpDocument.Builder document) {
+    for (AnnotatedSentence sentence : document.getSentencesList()) {
+      if (sentence.getTokensCount() == 0) {
+        throw AnalysisException.failedPrecondition(
+            "token chunking requires " + 
PipelineStep.PIPELINE_STEP_TOKENIZE.name());
+      }
+    }
+  }
+
   private static void validateModelBundle(AnalysisProfile profile) {
     if (!profile.hasModelBundle()) {
       return;
@@ -193,21 +354,6 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
     }
   }
 
-  private static void validateSemanticChunking(ChunkEmbedConfigEntry entry) {
-    if (!entry.hasChunking() || !entry.getChunking().hasSemanticConfig()) {
-      return;
-    }
-    final SemanticChunkingConfig semantic = 
entry.getChunking().getSemanticConfig();
-    if (semantic.hasSemanticEmbeddingModelId() && 
!semantic.getSemanticEmbeddingModelId().isBlank()) {
-      return;
-    }
-    if (entry.getEmbeddingModelIdsCount() == 1) {
-      return;
-    }
-    throw AnalysisException.invalidArgument(
-        "semantic chunking requires semantic_embedding_model_id or exactly one 
embedding_model_id");
-  }
-
   private void runStep(
       PipelineStep step,
       List<ProcessingDiagnostic> diagnostics,
@@ -281,6 +427,40 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
         .build());
   }
 
+  private void runEmbedding(
+      String rawText,
+      OpenNlpDocument.Builder document,
+      String modelId,
+      List<ProcessingDiagnostic> diagnostics) {
+    int embeddingCount = 0;
+    for (AnnotatedSentence sentence : document.getSentencesList()) {
+      final AnnotationSpan sentenceSpan = sentence.getSentenceSpan();
+      final String sentenceText = rawText.substring(sentenceSpan.getStart(), 
sentenceSpan.getEnd());
+      final float[] vector = embeddingProvider.embed(modelId, sentenceText);
+      document.addEmbeddings(EmbeddingResult.newBuilder()
+          .setModelId(modelId)
+          .addAllVector(toFloatList(vector))
+          .setSourceSpan(sentenceSpan)
+          .setGranularity(EmbeddingGranularity.EMBEDDING_GRANULARITY_SENTENCE)
+          .build());
+      embeddingCount++;
+    }
+    diagnostics.add(ProcessingDiagnostic.newBuilder()
+        .setStep(PipelineStep.PIPELINE_STEP_EMBED)
+        .setSeverity(DiagnosticSeverity.DIAGNOSTIC_SEVERITY_INFO)
+        .setMessage("Generated " + embeddingCount + " sentence embedding(s) 
with model '"
+            + modelId + "'")
+        .build());
+  }
+
+  private static List<Float> toFloatList(float[] vector) {
+    final List<Float> values = new ArrayList<>(vector.length);
+    for (float value : vector) {
+      values.add(value);
+    }
+    return values;
+  }
+
   /**
    * Converts every span in the document from Java UTF-16 indices to the 
requested
    * {@link OffsetEncoding} and records the chosen encoding on the document.
@@ -298,6 +478,27 @@ public class BasicDocumentAnalyzer implements 
DocumentAnalyzer {
       }
       document.setSentences(i, sentence.build());
     }
+    for (int e = 0; e < document.getEmbeddingsCount(); e++) {
+      final EmbeddingResult embedding = document.getEmbeddings(e);
+      document.setEmbeddings(e, embedding.toBuilder()
+          .setSourceSpan(remap(embedding.getSourceSpan(), mapper))
+          .build());
+    }
+    for (int g = 0; g < document.getChunkEmbeddingGroupsCount(); g++) {
+      final ChunkEmbeddingGroup.Builder group = 
document.getChunkEmbeddingGroups(g).toBuilder();
+      for (int c = 0; c < group.getChunksCount(); c++) {
+        final Chunk.Builder chunk = group.getChunks(c).toBuilder();
+        chunk.setAnnotationSpan(remap(chunk.getAnnotationSpan(), mapper));
+        for (int e = 0; e < chunk.getEmbeddingsCount(); e++) {
+          final EmbeddingResult embedding = chunk.getEmbeddings(e);
+          chunk.setEmbeddings(e, embedding.toBuilder()
+              .setSourceSpan(remap(embedding.getSourceSpan(), mapper))
+              .build());
+        }
+        group.setChunks(c, chunk.build());
+      }
+      document.setChunkEmbeddingGroups(g, group.build());
+    }
     document.setOffsetEncoding(mapper.encoding());
   }
 
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/PipelineStepPolicy.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/PipelineStepPolicy.java
index bbf5954c..0cc8836a 100644
--- 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/PipelineStepPolicy.java
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/processor/PipelineStepPolicy.java
@@ -31,7 +31,9 @@ public final class PipelineStepPolicy {
   /** Steps implemented by the current processor, in execution order. */
   private static final List<PipelineStep> IMPLEMENTED_STEPS = List.of(
       PipelineStep.PIPELINE_STEP_SENTENCE_DETECT,
-      PipelineStep.PIPELINE_STEP_TOKENIZE);
+      PipelineStep.PIPELINE_STEP_TOKENIZE,
+      PipelineStep.PIPELINE_STEP_CHUNK,
+      PipelineStep.PIPELINE_STEP_EMBED);
 
   private static final Set<PipelineStep> IMPLEMENTED_STEP_SET = 
Set.copyOf(IMPLEMENTED_STEPS);
 
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/server/OpenNlpGrpcServer.java
 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/server/OpenNlpGrpcServer.java
index 111a66bb..dd83f85b 100644
--- 
a/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/server/OpenNlpGrpcServer.java
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/main/java/org/apache/opennlp/grpc/server/OpenNlpGrpcServer.java
@@ -114,7 +114,7 @@ public class OpenNlpGrpcServer implements Callable<Integer> 
{
     this.server.start();
     logger.info("Started OpenNlpGrpcServer on port {}", server.getPort());
 
-    registerShutdownHook();
+    registerShutdownHook(modelBundleCache);
   }
 
   public void awaitTermination() throws InterruptedException {
@@ -149,13 +149,14 @@ public class OpenNlpGrpcServer implements 
Callable<Integer> {
     return configuration;
   }
 
-  private void registerShutdownHook() {
+  private void registerShutdownHook(ModelBundleCache modelBundleCache) {
     Runtime.getRuntime()
         .addShutdownHook(
             new Thread(
                 () -> {
                   try {
                     stop();
+                    modelBundleCache.close();
                   } catch (Exception e) {
                     logger.error(
                         "Error when trying to shutdown a lifecycle component: 
{}",
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/ChunkEmbedProcessorSemanticTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/ChunkEmbedProcessorSemanticTest.java
new file mode 100644
index 00000000..18b783e4
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/ChunkEmbedProcessorSemanticTest.java
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.chunk;
+
+import java.util.Map;
+
+import org.apache.opennlp.grpc.embedding.StubEmbeddingProvider;
+import org.apache.opennlp.grpc.v1.AnnotatedSentence;
+import org.apache.opennlp.grpc.v1.AnnotationSpan;
+import org.apache.opennlp.grpc.v1.ChunkEmbedConfigEntry;
+import org.apache.opennlp.grpc.v1.ChunkingSpec;
+import org.apache.opennlp.grpc.v1.CoordinateSpace;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.SemanticChunkingConfig;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+class ChunkEmbedProcessorSemanticTest {
+
+  private static final float[] TOPIC_A = {1f, 0f, 0f};
+  private static final float[] TOPIC_B = {0f, 1f, 0f};
+
+  private final StubEmbeddingProvider provider = new StubEmbeddingProvider(
+      Map.of("minilm", 3),
+      (modelId, text) -> text.startsWith("A") ? TOPIC_A : TOPIC_B,
+      java.util.Set.of());
+
+  @Test
+  void buildsSemanticGroupWithEmbeddings() {
+    final String rawText = "Aa.Ab.Bc.";
+    final OpenNlpDocument document = OpenNlpDocument.newBuilder()
+        .setRawText(rawText)
+        .addSentences(sentence(0, 3))
+        .addSentences(sentence(3, 6))
+        .addSentences(sentence(6, 9))
+        .build();
+    final ChunkEmbedConfigEntry entry = ChunkEmbedConfigEntry.newBuilder()
+        .setConfigId("semantic-topics")
+        .setChunking(ChunkingSpec.newBuilder()
+            .setAlgorithm("semantic")
+            .setSemanticConfig(SemanticChunkingConfig.newBuilder()
+                .setSimilarityThreshold(0.9f)
+                .setSemanticEmbeddingModelId("minilm")
+                .build())
+            .build())
+        .addEmbeddingModelIds("minilm")
+        .build();
+
+    final var group = ChunkEmbedProcessor.buildGroup(rawText, document, entry, 
provider);
+
+    assertEquals(2, group.getChunksCount());
+    assertEquals(1, group.getChunks(0).getEmbeddingsCount());
+  }
+
+  private static AnnotatedSentence sentence(int start, int end) {
+    return AnnotatedSentence.newBuilder()
+        .setSentenceSpan(AnnotationSpan.newBuilder()
+            .setStart(start)
+            .setEnd(end)
+            .setSpace(CoordinateSpace.COORDINATE_SPACE_CHAR_DOCUMENT)
+            .build())
+        .build();
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/SegmentationChunkerTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/SegmentationChunkerTest.java
new file mode 100644
index 00000000..8ef4a67e
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/SegmentationChunkerTest.java
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.chunk;
+
+import java.util.Map;
+
+import org.apache.opennlp.grpc.embedding.EmbeddingProvider;
+import org.apache.opennlp.grpc.embedding.StubEmbeddingProvider;
+import org.apache.opennlp.grpc.v1.AnnotatedSentence;
+import org.apache.opennlp.grpc.v1.AnnotationSpan;
+import org.apache.opennlp.grpc.v1.ChunkingSpec;
+import org.apache.opennlp.grpc.v1.CoordinateSpace;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.Token;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+class SegmentationChunkerTest {
+
+  private static final EmbeddingProvider NO_MODELS = new 
StubEmbeddingProvider(Map.of());
+
+  @Test
+  void sentenceAlgorithmCreatesOneChunkPerSentence() {
+    final OpenNlpDocument document = OpenNlpDocument.newBuilder()
+        .setRawText("One. Two!")
+        .addSentences(sentence(0, 4))
+        .addSentences(sentence(5, 9))
+        .build();
+
+    final var chunks = SegmentationChunker.segment(document.getRawText(), 
document,
+        ChunkingSpec.newBuilder().setAlgorithm("sentence").build(), NO_MODELS);
+
+    assertEquals(2, chunks.size());
+    assertEquals(0, chunks.get(0).start());
+    assertEquals(4, chunks.get(0).end());
+    assertEquals(1, chunks.get(1).sentenceIndices().size());
+  }
+
+  @Test
+  void tokenAlgorithmCreatesOverlappingWindows() {
+    final OpenNlpDocument document = OpenNlpDocument.newBuilder()
+        .setRawText("a b c d e")
+        .addSentences(AnnotatedSentence.newBuilder()
+            .setSentenceSpan(span(0, 9))
+            .addTokens(token("a", 0, 1))
+            .addTokens(token("b", 2, 3))
+            .addTokens(token("c", 4, 5))
+            .addTokens(token("d", 6, 7))
+            .addTokens(token("e", 8, 9))
+            .build())
+        .build();
+
+    final var chunks = SegmentationChunker.segment(document.getRawText(), 
document,
+        ChunkingSpec.newBuilder()
+            .setAlgorithm("token")
+            .setChunkSize(3)
+            .setChunkOverlap(1)
+            .build(),
+        NO_MODELS);
+
+    assertEquals(2, chunks.size());
+    assertEquals(0, chunks.get(0).start());
+    assertEquals(5, chunks.get(0).end());
+    assertEquals(4, chunks.get(1).start());
+    assertEquals(9, chunks.get(1).end());
+  }
+
+  private static AnnotatedSentence sentence(int start, int end) {
+    return AnnotatedSentence.newBuilder()
+        .setSentenceSpan(span(start, end))
+        .build();
+  }
+
+  private static Token token(String text, int start, int end) {
+    return Token.newBuilder()
+        .setText(text)
+        .setAnnotationSpan(span(start, end))
+        .build();
+  }
+
+  private static AnnotationSpan span(int start, int end) {
+    return AnnotationSpan.newBuilder()
+        .setStart(start)
+        .setEnd(end)
+        .setSpace(CoordinateSpace.COORDINATE_SPACE_CHAR_DOCUMENT)
+        .build();
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/SemanticChunkerTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/SemanticChunkerTest.java
new file mode 100644
index 00000000..bb0a7147
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/chunk/SemanticChunkerTest.java
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.chunk;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.opennlp.grpc.embedding.StubEmbeddingProvider;
+import org.apache.opennlp.grpc.v1.AnnotatedSentence;
+import org.apache.opennlp.grpc.v1.AnnotationSpan;
+import org.apache.opennlp.grpc.v1.ChunkingSpec;
+import org.apache.opennlp.grpc.v1.CoordinateSpace;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.SemanticChunkingConfig;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+class SemanticChunkerTest {
+
+  private static final float[] TOPIC_A = {1f, 0f, 0f};
+  private static final float[] TOPIC_B = {0f, 1f, 0f};
+
+  private final StubEmbeddingProvider provider = new StubEmbeddingProvider(
+      Map.of("minilm", 3),
+      (modelId, text) -> text.startsWith("A") ? TOPIC_A : TOPIC_B,
+      Set.of());
+
+  @Test
+  void splitsWhenAdjacentSentenceSimilarityIsLow() {
+    final OpenNlpDocument document = OpenNlpDocument.newBuilder()
+        .setRawText("Aa.Ab.Bc.")
+        .addSentences(sentence(0, 3))
+        .addSentences(sentence(3, 6))
+        .addSentences(sentence(6, 9))
+        .build();
+
+    final var chunks = SemanticChunker.chunk(
+        document.getRawText(),
+        document,
+        
SemanticChunkingConfig.newBuilder().setSimilarityThreshold(0.9f).build(),
+        provider,
+        "minilm");
+
+    assertEquals(2, chunks.size());
+    assertEquals(List.of(0, 1), chunks.get(0).sentenceIndices());
+    assertEquals(List.of(2), chunks.get(1).sentenceIndices());
+  }
+
+  @Test
+  void mergesUndersizedTrailingChunkIntoPrecedingChunk() {
+    final OpenNlpDocument document = OpenNlpDocument.newBuilder()
+        .setRawText("Aa.Ab.Bc.")
+        .addSentences(sentence(0, 3))
+        .addSentences(sentence(3, 6))
+        .addSentences(sentence(6, 9))
+        .build();
+
+    final var chunks = SemanticChunker.chunk(
+        document.getRawText(),
+        document,
+        SemanticChunkingConfig.newBuilder()
+            .setSimilarityThreshold(0.9f)
+            .setMinChunkSentences(2)
+            .build(),
+        provider,
+        "minilm");
+
+    assertEquals(1, chunks.size());
+    assertEquals(List.of(0, 1, 2), chunks.get(0).sentenceIndices());
+  }
+
+  @Test
+  void mergesUndersizedLeadingChunkWithFollowingChunk() {
+    final OpenNlpDocument document = OpenNlpDocument.newBuilder()
+        .setRawText("Ba.Ab.Ac.")
+        .addSentences(sentence(0, 3))
+        .addSentences(sentence(3, 6))
+        .addSentences(sentence(6, 9))
+        .build();
+
+    final var chunks = SemanticChunker.chunk(
+        document.getRawText(),
+        document,
+        SemanticChunkingConfig.newBuilder()
+            .setSimilarityThreshold(0.9f)
+            .setMinChunkSentences(2)
+            .build(),
+        provider,
+        "minilm");
+
+    assertEquals(1, chunks.size());
+    assertEquals(List.of(0, 1, 2), chunks.get(0).sentenceIndices());
+  }
+
+  @Test
+  void splitsChunksLargerThanMaxChunkSentences() {
+    final OpenNlpDocument document = OpenNlpDocument.newBuilder()
+        .setRawText("Aa.Ab.Ac.Ad.")
+        .addSentences(sentence(0, 3))
+        .addSentences(sentence(3, 6))
+        .addSentences(sentence(6, 9))
+        .addSentences(sentence(9, 12))
+        .build();
+
+    final var chunks = SemanticChunker.chunk(
+        document.getRawText(),
+        document,
+        SemanticChunkingConfig.newBuilder()
+            .setSimilarityThreshold(0.9f)
+            .setMaxChunkSentences(2)
+            .build(),
+        provider,
+        "minilm");
+
+    assertEquals(2, chunks.size());
+    assertEquals(List.of(0, 1), chunks.get(0).sentenceIndices());
+    assertEquals(List.of(2, 3), chunks.get(1).sentenceIndices());
+  }
+
+  @Test
+  void cosineSimilarityIsOneForIdenticalVectors() {
+    assertEquals(1f, SemanticChunker.cosineSimilarity(TOPIC_A, TOPIC_A), 
0.0001f);
+  }
+
+  private static AnnotatedSentence sentence(int start, int end) {
+    return AnnotatedSentence.newBuilder()
+        .setSentenceSpan(AnnotationSpan.newBuilder()
+            .setStart(start)
+            .setEnd(end)
+            .setSpace(CoordinateSpace.COORDINATE_SPACE_CHAR_DOCUMENT)
+            .build())
+        .build();
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/embedding/EmbeddingProviderFactoryTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/embedding/EmbeddingProviderFactoryTest.java
new file mode 100644
index 00000000..167e02dd
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/embedding/EmbeddingProviderFactoryTest.java
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.util.Map;
+
+import org.apache.opennlp.grpc.processor.AnalysisException;
+import org.apache.opennlp.grpc.v1.InferenceBackend;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertInstanceOf;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+class EmbeddingProviderFactoryTest {
+
+  @Test
+  void defaultsToCpuProvider() {
+    final EmbeddingProvider provider = 
EmbeddingProviderFactory.create(Map.of());
+    assertInstanceOf(OnnxRuntimeEmbeddingProvider.class, provider);
+    
assertTrue(provider.supportsInferenceBackend(InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME));
+    
assertFalse(provider.supportsInferenceBackend(InferenceBackend.INFERENCE_BACKEND_CUDA));
+  }
+
+  @Test
+  void selectsCudaProviderFromConfig() {
+    final EmbeddingProvider provider =
+        EmbeddingProviderFactory.create(Map.of("model.embedder.backend", 
"cuda"));
+    assertInstanceOf(CudaEmbeddingProvider.class, provider);
+    
assertTrue(provider.supportsInferenceBackend(InferenceBackend.INFERENCE_BACKEND_CUDA));
+    
assertTrue(provider.supportsInferenceBackend(InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME_GPU));
+  }
+
+  @Test
+  void rejectsUnknownBackend() {
+    final AnalysisException e = assertThrows(AnalysisException.class,
+        () -> EmbeddingProviderFactory.create(Map.of("model.embedder.backend", 
"openvino")));
+    assertEquals(AnalysisException.FailureType.INVALID_ARGUMENT, 
e.getFailureType());
+  }
+
+  @Test
+  void rejectsGpuDeviceIdWithoutCudaBackend() {
+    final AnalysisException e = assertThrows(AnalysisException.class,
+        () -> 
EmbeddingProviderFactory.create(Map.of("model.embedder.gpu_device_id", "1")));
+    assertEquals(AnalysisException.FailureType.INVALID_ARGUMENT, 
e.getFailureType());
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/embedding/StubEmbeddingProvider.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/embedding/StubEmbeddingProvider.java
new file mode 100644
index 00000000..8e3ed858
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/embedding/StubEmbeddingProvider.java
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.embedding;
+
+import java.util.Map;
+import java.util.Set;
+import java.util.function.BiFunction;
+
+import org.apache.opennlp.grpc.v1.InferenceBackend;
+
+/**
+ * Test double returning deterministic or caller-supplied vectors.
+ */
+public final class StubEmbeddingProvider implements EmbeddingProvider {
+
+  private final Map<String, Integer> dimensions;
+  private final BiFunction<String, String, float[]> embedFn;
+  private final Set<InferenceBackend> backends;
+
+  public StubEmbeddingProvider(Map<String, Integer> dimensions) {
+    this(dimensions, null, 
Set.of(InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME));
+  }
+
+  public StubEmbeddingProvider(
+      Map<String, Integer> dimensions,
+      BiFunction<String, String, float[]> embedFn,
+      Set<InferenceBackend> backends) {
+    this.dimensions = Map.copyOf(dimensions);
+    this.embedFn = embedFn;
+    this.backends = Set.copyOf(backends);
+  }
+
+  @Override
+  public boolean isAvailable() {
+    return !dimensions.isEmpty();
+  }
+
+  @Override
+  public Set<String> registeredModelIds() {
+    return dimensions.keySet();
+  }
+
+  @Override
+  public boolean supportsModel(String modelId) {
+    return dimensions.containsKey(modelId);
+  }
+
+  @Override
+  public int embeddingDimension(String modelId) {
+    return dimensions.getOrDefault(modelId, 0);
+  }
+
+  @Override
+  public float[] embed(String modelId, String text) {
+    if (embedFn != null) {
+      return embedFn.apply(modelId, text);
+    }
+    final int dimension = embeddingDimension(modelId);
+    final float[] vector = new float[dimension];
+    final int seed = (modelId + ":" + text).hashCode();
+    for (int i = 0; i < dimension; i++) {
+      vector[i] = (seed + i) * 0.001f;
+    }
+    return vector;
+  }
+
+  @Override
+  public boolean supportsInferenceBackend(InferenceBackend backend) {
+    return backend == InferenceBackend.INFERENCE_BACKEND_UNSPECIFIED
+        || backend == InferenceBackend.INFERENCE_BACKEND_OPENNLP_ME
+        || backend == InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME
+        || backends.contains(backend);
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerChunkEmbedTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerChunkEmbedTest.java
new file mode 100644
index 00000000..45ed0391
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerChunkEmbedTest.java
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.processor;
+
+import java.util.Map;
+
+import org.apache.opennlp.grpc.embedding.StubEmbeddingProvider;
+import org.apache.opennlp.grpc.model.ModelBundleCache;
+import org.apache.opennlp.grpc.profile.ProfileRegistry;
+import org.apache.opennlp.grpc.v1.AnalysisProfile;
+import org.apache.opennlp.grpc.v1.AnalyzeDocumentRequest;
+import org.apache.opennlp.grpc.v1.ChunkEmbedConfigEntry;
+import org.apache.opennlp.grpc.v1.ChunkingSpec;
+import org.apache.opennlp.grpc.v1.EmbeddingGranularity;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.PipelineStep;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+class BasicDocumentAnalyzerChunkEmbedTest {
+
+  private static final String TEXT = "First sentence. Second sentence!";
+
+  private final ModelBundleCache modelBundleCache = new 
ModelBundleCache(Map.of());
+  private final StubEmbeddingProvider embeddingProvider =
+      new StubEmbeddingProvider(Map.of("minilm", 3, "e5", 3));
+  private final BasicDocumentAnalyzer analyzer = new BasicDocumentAnalyzer(
+      ProfileRegistry.createDefault(), modelBundleCache, embeddingProvider);
+
+  @Test
+  void chunkEmbedConfigsProduceGroupsWithEmbeddings() {
+    final var response = analyzer.analyze(AnalyzeDocumentRequest.newBuilder()
+        .setDocument(OpenNlpDocument.newBuilder().setRawText(TEXT).build())
+        .addChunkEmbedConfigs(ChunkEmbedConfigEntry.newBuilder()
+            .setConfigId("sentence-chunks")
+            
.setChunking(ChunkingSpec.newBuilder().setAlgorithm("sentence").build())
+            .addEmbeddingModelIds("minilm")
+            .addEmbeddingModelIds("e5")
+            .build())
+        .build());
+
+    assertEquals(2, response.getDocument().getSentencesCount());
+    assertEquals(1, response.getDocument().getChunkEmbeddingGroupsCount());
+    final var group = response.getDocument().getChunkEmbeddingGroups(0);
+    assertEquals("sentence-chunks", group.getGroupId());
+    assertEquals(2, group.getChunksCount());
+    assertEquals(2, group.getChunks(0).getEmbeddingsCount());
+    assertEquals("minilm", group.getChunks(0).getEmbeddings(0).getModelId());
+    assertEquals(
+        EmbeddingGranularity.EMBEDDING_GRANULARITY_CHUNK_LEVEL,
+        group.getChunks(0).getEmbeddings(0).getGranularity());
+    assertTrue(group.getStats().getChunkCount() > 0);
+  }
+
+  @Test
+  void profileChunkStepProducesSentenceGroupsWithoutEmbeddings() {
+    final var response = analyzer.analyze(AnalyzeDocumentRequest.newBuilder()
+        .setDocument(OpenNlpDocument.newBuilder().setRawText(TEXT).build())
+        .setProfile(AnalysisProfile.newBuilder()
+            .setProfileId("chunk-only")
+            .addSteps(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT)
+            .addSteps(PipelineStep.PIPELINE_STEP_CHUNK)
+            .build())
+        .build());
+
+    assertEquals(1, response.getDocument().getChunkEmbeddingGroupsCount());
+    assertEquals(2, 
response.getDocument().getChunkEmbeddingGroups(0).getChunksCount());
+    assertEquals(0, 
response.getDocument().getChunkEmbeddingGroups(0).getChunks(0).getEmbeddingsCount());
+  }
+
+  @Test
+  void tokenChunkingAutoRunsTokenizationBackbone() {
+    final var response = analyzer.analyze(AnalyzeDocumentRequest.newBuilder()
+        .setDocument(OpenNlpDocument.newBuilder().setRawText("one two three 
four five").build())
+        .addChunkEmbedConfigs(ChunkEmbedConfigEntry.newBuilder()
+            .setConfigId("token-chunks")
+            .setChunking(ChunkingSpec.newBuilder()
+                .setAlgorithm("token")
+                .setChunkSize(2)
+                .setChunkOverlap(0)
+                .build())
+            .addEmbeddingModelIds("minilm")
+            .build())
+        .build());
+
+    assertTrue(response.getDocument().getSentences(0).getTokensCount() > 0);
+    assertEquals(3, 
response.getDocument().getChunkEmbeddingGroups(0).getChunksCount());
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerEmbeddingTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerEmbeddingTest.java
new file mode 100644
index 00000000..69f25d27
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerEmbeddingTest.java
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.processor;
+
+import java.util.Map;
+
+import org.apache.opennlp.grpc.embedding.StubEmbeddingProvider;
+import org.apache.opennlp.grpc.model.ModelBundleCache;
+import org.apache.opennlp.grpc.profile.ProfileRegistry;
+import org.apache.opennlp.grpc.v1.AnalysisOptions;
+import org.apache.opennlp.grpc.v1.AnalysisProfile;
+import org.apache.opennlp.grpc.v1.AnalyzeDocumentRequest;
+import org.apache.opennlp.grpc.v1.EmbeddingGranularity;
+import org.apache.opennlp.grpc.v1.InferenceBackend;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.PipelineStep;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+class BasicDocumentAnalyzerEmbeddingTest {
+
+  private static final String TEXT = "One sentence. Two sentences!";
+
+  private final ModelBundleCache modelBundleCache = new 
ModelBundleCache(Map.of());
+  private final StubEmbeddingProvider embeddingProvider =
+      new StubEmbeddingProvider(Map.of("minilm", 4));
+  private final BasicDocumentAnalyzer analyzer = new BasicDocumentAnalyzer(
+      ProfileRegistry.createDefault(), modelBundleCache, embeddingProvider);
+
+  @Test
+  void generatesSentenceEmbeddingsWhenEmbedStepRequested() {
+    final var response = analyzer.analyze(AnalyzeDocumentRequest.newBuilder()
+        .setDocument(OpenNlpDocument.newBuilder().setRawText(TEXT).build())
+        .setProfile(AnalysisProfile.newBuilder()
+            .setProfileId("with-embed")
+            .addSteps(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT)
+            .addSteps(PipelineStep.PIPELINE_STEP_TOKENIZE)
+            .addSteps(PipelineStep.PIPELINE_STEP_EMBED)
+            .build())
+        .setOptions(AnalysisOptions.newBuilder()
+            .setOnnxEmbeddingModelId("minilm")
+            
.setInferenceBackend(InferenceBackend.INFERENCE_BACKEND_ONNX_RUNTIME)
+            .build())
+        .build());
+
+    assertEquals(2, response.getDocument().getSentencesCount());
+    assertEquals(2, response.getDocument().getEmbeddingsCount());
+    assertEquals("minilm", 
response.getDocument().getEmbeddings(0).getModelId());
+    assertEquals(4, response.getDocument().getEmbeddings(0).getVectorCount());
+    assertEquals(
+        EmbeddingGranularity.EMBEDDING_GRANULARITY_SENTENCE,
+        response.getDocument().getEmbeddings(0).getGranularity());
+    assertTrue(response.getDiagnosticsList().stream()
+        .anyMatch(d -> d.getStep() == PipelineStep.PIPELINE_STEP_EMBED));
+  }
+
+  @Test
+  void rejectsUnknownEmbeddingModel() {
+    final AnalysisException error = assertThrows(AnalysisException.class, () 
-> analyzer.analyze(
+        AnalyzeDocumentRequest.newBuilder()
+            .setDocument(OpenNlpDocument.newBuilder().setRawText(TEXT).build())
+            .setProfile(AnalysisProfile.newBuilder()
+                .setProfileId("with-embed")
+                .addSteps(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT)
+                .addSteps(PipelineStep.PIPELINE_STEP_TOKENIZE)
+                .addSteps(PipelineStep.PIPELINE_STEP_EMBED)
+                .build())
+            
.setOptions(AnalysisOptions.newBuilder().setOnnxEmbeddingModelId("missing").build())
+            .build()));
+
+    assertEquals(AnalysisException.FailureType.NOT_FOUND, 
error.getFailureType());
+  }
+}
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerPolicyTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerPolicyTest.java
index da40e5ec..d5096aca 100644
--- 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerPolicyTest.java
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerPolicyTest.java
@@ -80,18 +80,26 @@ class BasicDocumentAnalyzerPolicyTest {
   }
 
   @Test
-  void rejectsChunkEmbedConfigs() {
+  void rejectsSemanticChunkEmbedConfigsWithoutEmbeddingModel() {
     final BasicDocumentAnalyzer analyzer = new BasicDocumentAnalyzer(Map.of());
 
     final AnalysisException error = assertThrows(AnalysisException.class, () 
-> analyzer.analyze(
         AnalyzeDocumentRequest.newBuilder()
             .setDocument(OpenNlpDocument.newBuilder().setRawText("Hello 
world.").build())
             .addChunkEmbedConfigs(ChunkEmbedConfigEntry.newBuilder()
-                .setConfigId("token-chunks")
+                .setConfigId("semantic")
+                .setChunking(ChunkingSpec.newBuilder()
+                    .setAlgorithm("semantic")
+                    .setSemanticConfig(SemanticChunkingConfig.newBuilder()
+                        .setSimilarityThreshold(0.5f)
+                        .build())
+                    .build())
+                .addEmbeddingModelIds("minilm")
+                .addEmbeddingModelIds("e5")
                 .build())
             .build()));
 
-    assertEquals(AnalysisException.FailureType.UNIMPLEMENTED, 
error.getFailureType());
+    assertEquals(AnalysisException.FailureType.INVALID_ARGUMENT, 
error.getFailureType());
   }
 
   @Test
@@ -143,7 +151,7 @@ class BasicDocumentAnalyzerPolicyTest {
   }
 
   @Test
-  void rejectsOnnxEmbeddingModelId() {
+  void rejectsOnnxEmbeddingModelIdWithoutEmbedStep() {
     final BasicDocumentAnalyzer analyzer = new BasicDocumentAnalyzer(Map.of());
 
     final AnalysisException error = assertThrows(AnalysisException.class, () 
-> analyzer.analyze(
@@ -152,7 +160,25 @@ class BasicDocumentAnalyzerPolicyTest {
             
.setOptions(AnalysisOptions.newBuilder().setOnnxEmbeddingModelId("minilm").build())
             .build()));
 
-    assertEquals(AnalysisException.FailureType.UNIMPLEMENTED, 
error.getFailureType());
+    assertEquals(AnalysisException.FailureType.INVALID_ARGUMENT, 
error.getFailureType());
+  }
+
+  @Test
+  void rejectsEmbedStepWhenNoModelsConfigured() {
+    final BasicDocumentAnalyzer analyzer = new BasicDocumentAnalyzer(Map.of());
+
+    final AnalysisException error = assertThrows(AnalysisException.class, () 
-> analyzer.analyze(
+        AnalyzeDocumentRequest.newBuilder()
+            .setDocument(OpenNlpDocument.newBuilder().setRawText("Hello 
world.").build())
+            .setProfile(AnalysisProfile.newBuilder()
+                .setProfileId("with-embed")
+                .addSteps(PipelineStep.PIPELINE_STEP_SENTENCE_DETECT)
+                .addSteps(PipelineStep.PIPELINE_STEP_TOKENIZE)
+                .addSteps(PipelineStep.PIPELINE_STEP_EMBED)
+                .build())
+            .build()));
+
+    assertEquals(AnalysisException.FailureType.NOT_FOUND, 
error.getFailureType());
   }
 
   @Test
diff --git 
a/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerSemanticChunkTest.java
 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerSemanticChunkTest.java
new file mode 100644
index 00000000..adb8be38
--- /dev/null
+++ 
b/opennlp-grpc/opennlp-grpc-service/src/test/java/org/apache/opennlp/grpc/processor/BasicDocumentAnalyzerSemanticChunkTest.java
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the specific
+ * language governing permissions and limitations under the License.
+ */
+package org.apache.opennlp.grpc.processor;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.opennlp.grpc.embedding.StubEmbeddingProvider;
+import org.apache.opennlp.grpc.model.ModelBundleCache;
+import org.apache.opennlp.grpc.profile.ProfileRegistry;
+import org.apache.opennlp.grpc.v1.AnalyzeDocumentRequest;
+import org.apache.opennlp.grpc.v1.ChunkEmbedConfigEntry;
+import org.apache.opennlp.grpc.v1.ChunkEmbeddingGroup;
+import org.apache.opennlp.grpc.v1.ChunkingSpec;
+import org.apache.opennlp.grpc.v1.OpenNlpDocument;
+import org.apache.opennlp.grpc.v1.SemanticChunkingConfig;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+class BasicDocumentAnalyzerSemanticChunkTest {
+
+  private static final List<Float> TOPIC_BUSINESS = List.of(1f, 0f, 0f);
+  private static final List<Float> TOPIC_WEATHER = List.of(0f, 1f, 0f);
+
+  /** Embeds any text mentioning rain as the weather topic and everything else 
as business. */
+  private final StubEmbeddingProvider embeddingProvider = new 
StubEmbeddingProvider(
+      Map.of("minilm", 3),
+      (modelId, text) -> text.contains("rain")
+          ? new float[] {0f, 1f, 0f} : new float[] {1f, 0f, 0f},
+      Set.of());
+
+  private final BasicDocumentAnalyzer analyzer = new BasicDocumentAnalyzer(
+      ProfileRegistry.createDefault(),
+      new ModelBundleCache(Map.of()),
+      embeddingProvider);
+
+  @Test
+  void semanticChunkEmbedConfigSplitsAtTopicBoundary() {
+    final var response = analyzer.analyze(AnalyzeDocumentRequest.newBuilder()
+        .setDocument(OpenNlpDocument.newBuilder()
+            .setRawText("The merger closed on Monday. The shareholders 
approved the deal. "
+                + "Heavy rain flooded the valley.")
+            .build())
+        .addChunkEmbedConfigs(ChunkEmbedConfigEntry.newBuilder()
+            .setConfigId("semantic-topics")
+            .setChunking(ChunkingSpec.newBuilder()
+                .setAlgorithm("semantic")
+                .setSemanticConfig(SemanticChunkingConfig.newBuilder()
+                    .setSimilarityThreshold(0.5f)
+                    .setSemanticEmbeddingModelId("minilm")
+                    .build())
+                .build())
+            .addEmbeddingModelIds("minilm")
+            .build())
+        .build());
+
+    assertEquals(3, response.getDocument().getSentencesCount());
+    assertEquals(1, response.getDocument().getChunkEmbeddingGroupsCount());
+
+    final ChunkEmbeddingGroup group = 
response.getDocument().getChunkEmbeddingGroups(0);
+    assertEquals(2, group.getChunksCount());
+    assertEquals(List.of(0, 1), 
group.getChunks(0).getContainedSentenceIndicesList());
+    assertEquals(List.of(2), 
group.getChunks(1).getContainedSentenceIndicesList());
+
+    assertEquals(1, group.getChunks(0).getEmbeddingsCount());
+    assertEquals(1, group.getChunks(1).getEmbeddingsCount());
+    assertEquals("minilm", group.getChunks(0).getEmbeddings(0).getModelId());
+    assertEquals(TOPIC_BUSINESS, 
group.getChunks(0).getEmbeddings(0).getVectorList());
+    assertEquals(TOPIC_WEATHER, 
group.getChunks(1).getEmbeddings(0).getVectorList());
+  }
+}

(opennlp-sandbox) branch OPENNLP-1833-grpc-expansion updated: OPENNLP-1833: Add embeddings, segmentation and semantic chunking to the gRPC server

Reply via email to