(camel-jbang-examples) 01/01: Added an example of usage of docling-serve, ollama and langchain4j

acosentino Fri, 17 Oct 2025 14:21:20 -0700

This is an automated email from the ASF dual-hosted git repository.

acosentino pushed a commit to branch rag-docling-serve
in repository https://gitbox.apache.org/repos/asf/camel-jbang-examples.git


commit 15b4930be49ecaaa8f2cc247757c7682ba3a7a25
Author: Andrea Cosentino <[email protected]>
AuthorDate: Tue Oct 14 14:35:52 2025 +0200

    Added an example of usage of docling-serve, ollama and langchain4j
    
    Signed-off-by: Andrea Cosentino <[email protected]>
---
 docling-langchain4j-rag/.gitignore                 |  11 +
 docling-langchain4j-rag/README.adoc                | 651 +++++++++++++++++++++
 docling-langchain4j-rag/application.properties     |  39 ++
 docling-langchain4j-rag/compose.yaml               |  45 ++
 .../docling-langchain4j-rag.yaml                   | 378 ++++++++++++
 docling-langchain4j-rag/run.sh                     |  27 +
 docling-langchain4j-rag/sample.md                  |  65 ++
 7 files changed, 1216 insertions(+)

diff --git a/docling-langchain4j-rag/.gitignore 
b/docling-langchain4j-rag/.gitignore
new file mode 100644
index 0000000..408ca46
--- /dev/null
+++ b/docling-langchain4j-rag/.gitignore
@@ -0,0 +1,11 @@
+# Camel JBang working directory
+.camel-jbang/
+
+# Output directories
+output/
+
+# All documents (users copy sample.md manually for testing)
+documents/*
+
+# Logs
+*.log
diff --git a/docling-langchain4j-rag/README.adoc 
b/docling-langchain4j-rag/README.adoc
new file mode 100644
index 0000000..326b535
--- /dev/null
+++ b/docling-langchain4j-rag/README.adoc
@@ -0,0 +1,651 @@
+= Document Analysis with Docling and LangChain4j RAG
+
+This example demonstrates a complete RAG (Retrieval Augmented Generation) 
workflow using Apache Camel, combining:
+
+* **Docling** - AI-powered document conversion (PDF, Word, PowerPoint → 
Markdown/JSON)
+* **LangChain4j** - Integration with Large Language Models
+* **Ollama** - Local LLM inference
+
+== Overview
+
+This application provides intelligent document processing capabilities:
+
+* **Automatic Document Conversion** - Convert various document formats to 
Markdown using Docling
+* **AI-Powered Analysis** - Analyze documents using LLMs via LangChain4j
+* **Interactive Q&A** - Ask questions about your documents through REST API
+* **Batch Processing** - Summarize multiple documents automatically
+* **Structured Data Extraction** - Extract tables and structured information 
from documents
+
+== Architecture
+
+=== Components
+
+[source,text]
+----
+Documents → Docling (Convert) → Markdown → LangChain4j → Ollama (LLM) → 
Analysis
+----
+
+**Docling-Serve**: Python-based document conversion service running in Docker
+
+**Ollama**: Local LLM server running models like Llama 3.2
+
+**Camel Routes**: Orchestrate the workflow between components
+
+=== Features
+
+* **Document Format Support**: PDF, DOCX, PPTX, HTML, Markdown
+* **Multiple Operations**: Analysis, Q&A, Summarization, Data Extraction
+* **Docker-based**: All services run in containers
+* **REST API**: HTTP endpoints for interaction
+* **Automatic Processing**: File watcher for automatic document processing
+
+== Prerequisites
+
+* JBang installed (https://www.jbang.dev)
+* Java 11 or later
+* Docker and Docker Compose
+
+== Project Structure
+
+[source,text]
+----
+docling-langchain4j-rag/
+├── docling-langchain4j-rag.yaml   # Main YAML configuration
+├── application.properties          # Configuration settings
+├── compose.yaml                    # Docker Compose for services
+├── run.sh                          # Convenience run script
+├── sample.md                       # Sample document (copy to documents/ for 
testing)
+├── README.adoc                     # This file
+├── documents/                      # Input directory (files auto-deleted 
after processing)
+└── output/                         # Analysis reports output
+----
+
+== Setup
+
+=== Step 1: Start Required Services
+
+You have two options for running the required services:
+
+==== Option A: Using Docker Compose (Recommended)
+
+Start both Docling and Ollama services:
+
+[source,sh]
+----
+$ docker compose up -d
+----
+
+Pull the Ollama model (first time only):
+
+[source,sh]
+----
+$ docker exec -it ollama ollama pull orca-mini
+----
+
+Verify services are running:
+
+[source,sh]
+----
+$ curl http://localhost:5001/   # Docling
+$ curl http://localhost:11434/  # Ollama
+----
+
+==== Option B: Using Camel Infra Commands (If Available)
+
+[source,sh]
+----
+# Start Docling (if camel infra supports it)
+$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel infra run 
docling
+
+# Start Ollama (if camel infra supports it)
+$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel infra run 
ollama
+----
+
+==== Option C: Manual Docker Commands
+
+[source,sh]
+----
+# Start Docling-Serve
+$ docker run -d -p 5001:5001 --name docling-serve 
ghcr.io/docling-project/docling-serve:latest
+
+# Start Ollama
+$ docker run -d -p 11434:11434 --name ollama ollama/ollama:latest
+
+# Pull Ollama model
+$ docker exec -it ollama ollama pull orca-mini
+----
+
+=== Step 2: Create Required Directories
+
+The `documents/` and `output/` directories will be created automatically when 
needed, but you can create them manually:
+
+[source,sh]
+----
+$ mkdir -p documents output
+----
+
+**Note:** Files placed in `documents/` will be automatically processed and 
then **deleted** after analysis is complete.
+
+=== Step 3: Run the Camel Application
+
+[source,sh]
+----
+$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel run \
+  --fresh \
+  --dep=camel:docling \
+  --dep=camel:langchain4j-chat \
+  --dep=camel:platform-http \
+  --dep=dev.langchain4j:langchain4j:1.6.0 \
+  --dep=dev.langchain4j:langchain4j-ollama:1.6.0 \
+  --properties=application.properties \
+  docling-langchain4j-rag.yaml
+----
+
+The application will start and listen on port 8080.
+
+== Usage
+
+=== 1. Automatic Document Analysis
+
+Copy a document to the `documents/` directory for processing:
+
+[source,sh]
+----
+# Using the provided sample
+$ cp sample.md documents/
+
+# Or use your own document
+$ cp /path/to/your/document.pdf documents/
+----
+
+The system will:
+
+1. Detect the new file
+2. Convert it to Markdown using Docling
+3. Analyze it with the LLM
+4. Generate a comprehensive analysis report in `output/`
+5. **Automatically delete the source file** from `documents/` after processing
+
+**Example Output** (`output/sample.md_analysis.md`):
+
+[source,markdown]
+----
+# Document Analysis Report
+
+**File:** document.pdf
+**Date:** 2025-10-14 12:30:45
+
+---
+
+## AI Analysis
+
+**Summary:** This document discusses the implementation of RAG systems...
+
+**Key Topics:**
+- Document processing pipelines
+- LLM integration patterns
+- Vector embeddings and similarity search
+
+**Important Findings:**
+- RAG improves LLM accuracy by 40%
+- Hybrid search outperforms pure vector search
+...
+
+---
+
+## Full Document Content (Markdown)
+
+[Full converted markdown content here]
+----
+
+=== 2. Interactive Q&A
+
+Ask questions about your documents via HTTP API:
+
+[source,sh]
+----
+$ curl -X POST http://localhost:8080/api/ask \
+  -H "Content-Type: text/plain" \
+  -d "What are the main topics discussed in the document?"
+----
+
+**Response:**
+
+[source,text]
+----
+The document discusses three main topics:
+1. RAG (Retrieval Augmented Generation) architecture
+2. Document processing with Docling
+3. Integration with LangChain4j for LLM orchestration
+----
+
+=== 3. Structured Data Extraction
+
+Extract tables and structured data:
+
+[source,sh]
+----
+$ curl -X POST http://localhost:8080/api/extract \
+  -H "Content-Type: application/octet-stream" \
+  --data-binary "@documents/report.pdf"
+----
+
+**Response:**
+
+[source,text]
+----
+**Document Type:** Financial Report
+
+**Key Data Fields:**
+- Revenue: $1.2M (Table 1, Row 3)
+- Expenses: $800K (Table 1, Row 5)
+- Net Profit: $400K (calculated)
+
+**Tables Identified:**
+1. Quarterly Financial Summary (5 rows, 4 columns)
+2. Department Breakdown (8 rows, 3 columns)
+...
+----
+
+=== 4. Health Check
+
+Check system status:
+
+[source,sh]
+----
+$ curl http://localhost:8080/api/health
+----
+
+**Response:**
+
+[source,json]
+----
+{
+  "status": "healthy",
+  "components": {
+    "docling": {
+      "url": "http://localhost:5001";,
+      "status": "configured"
+    },
+    "ollama": {
+      "url": "http://localhost:11434";,
+      "model": "llama3.2",
+      "status": "configured"
+    }
+  },
+  "directories": {
+    "documents": "documents",
+    "output": "output"
+  }
+}
+----
+
+== Configuration
+
+=== application.properties
+
+[source,properties]
+----
+# Directories
+documents.directory=documents
+output.directory=output
+
+# Docling-Serve URL
+docling.serve.url=http://localhost:5001
+
+# Ollama Configuration
+ollama.base.url=http://localhost:11434
+ollama.model.name=llama3.2
+
+# Server Port
+camel.server.port=8080
+----
+
+=== Using Different Ollama Models
+
+Available models:
+
+* **llama3.2** (default) - Latest Llama model, good balance of speed and 
quality
+* **llama3.2:1b** - Smaller, faster model
+* **mistral** - Alternative high-quality model
+* **phi3** - Microsoft's efficient model
+* **gemma2** - Google's Gemma model
+
+To use a different model:
+
+1. Pull the model:
+
+[source,sh]
+----
+$ docker exec -it ollama ollama pull mistral
+----
+
+2. Update `application.properties`:
+
+[source,properties]
+----
+ollama.model.name=mistral
+----
+
+3. Restart the Camel application
+
+=== Using Remote Ollama Instance
+
+To use Ollama running on a different machine:
+
+[source,properties]
+----
+ollama.base.url=http://remote-server:11434
+----
+
+== Routes Explanation
+
+=== Route 1: document-analysis-workflow
+
+**Trigger:** New file in `documents/` directory
+
+**Flow:**
+
+1. Detect new document
+2. Convert to Markdown via Docling
+3. Send to LLM for analysis
+4. Generate comprehensive report
+5. Save to `output/` directory
+
+**Supported Formats:** PDF, DOCX, PPTX, HTML, MD
+
+=== Route 2: document-qa-api
+
+**Endpoint:** `POST /api/ask`
+
+**Description:** Answer questions about the most recent document
+
+**Input:** Plain text question
+
+**Output:** AI-generated answer based on document content
+
+=== Route 3: batch-summarization
+
+**Trigger:** Timer (configurable)
+
+**Description:** Process all documents in batch and generate summaries
+
+**Configuration:** Set `batch.delay` in application.properties (default: 
disabled)
+
+=== Route 4: health-check
+
+**Endpoint:** `GET /api/health`
+
+**Description:** System health and configuration status
+
+=== Route 5: extract-structured-data
+
+**Endpoint:** `POST /api/extract`
+
+**Description:** Extract tables and structured data from uploaded documents
+
+**Input:** Binary document data
+
+**Output:** AI analysis of extracted structured data
+
+== Advanced Usage
+
+=== Batch Processing
+
+Enable automatic batch summarization:
+
+[source,properties]
+----
+# Run every 1 hour (3600000 ms)
+batch.delay=3600000
+----
+
+All documents in the `documents/` directory will be summarized periodically.
+
+=== Custom Document Processing
+
+You can extend the routes to add custom processing logic:
+
+[source,yaml]
+----
+- route:
+    id: custom-processing
+    from:
+      uri: file:documents
+      parameters:
+        include: ".*\\.pdf"
+    steps:
+      # Your custom processing here
+      - to: docling:CONVERT_TO_HTML
+      - to: langchain4j-chat:custom
+----
+
+=== Integration with Vector Stores
+
+For production RAG, consider adding vector embeddings:
+
+[source,yaml]
+----
+# Add after document conversion
+- to: langchain4j-embeddings:embed
+- to: your-vector-store
+----
+
+== Troubleshooting
+
+=== Docling Not Responding
+
+**Check Docling service:**
+
+[source,sh]
+----
+$ docker logs docling-serve
+$ curl http://localhost:5001/
+----
+
+**Restart service:**
+
+[source,sh]
+----
+$ docker restart docling-serve
+----
+
+=== Ollama Model Not Found
+
+**Pull the model:**
+
+[source,sh]
+----
+$ docker exec -it ollama ollama pull llama3.2
+----
+
+**Check available models:**
+
+[source,sh]
+----
+$ docker exec -it ollama ollama list
+----
+
+=== Slow Document Processing
+
+**Causes:**
+
+* Large documents (>100 pages)
+* Complex layouts with many images
+* Limited CPU/memory
+
+**Solutions:**
+
+* Increase timeout in `application.properties`:
+
+[source,properties]
+----
+ollama.timeout=300
+----
+
+* Use a smaller/faster model (llama3.2:1b)
+* Process smaller documents first
+
+=== Out of Memory
+
+**Increase Docker memory:**
+
+[source,sh]
+----
+# In Docker Desktop: Settings → Resources → Memory
+# Recommended: 8GB or more for LLMs
+----
+
+== Performance Considerations
+
+=== Document Conversion
+
+* **PDF**: 1-5 seconds per page (depends on complexity)
+* **DOCX**: 0.5-2 seconds per page
+* **OCR-required**: 5-10 seconds per page (scanned PDFs)
+
+=== LLM Inference
+
+* **llama3.2 (3B)**: 5-15 seconds per response
+* **llama3.2:1b**: 2-5 seconds per response
+* **Speed depends on**: Prompt length, context size, hardware
+
+=== Recommended Hardware
+
+* **Minimum**: 8GB RAM, 4 CPU cores
+* **Recommended**: 16GB RAM, 8 CPU cores, GPU (optional)
+
+== Security Considerations
+
+=== Current Implementation
+
+* **Development Setup** - Not production-ready
+* **No Authentication** - Open HTTP endpoints
+* **Local Processing** - Data stays on your machine
+
+=== Production Recommendations
+
+**1. Authentication & Authorization**
+
+[source,yaml]
+----
+# Add to routes
+- setHeader:
+    name: Authorization
+    constant: "Bearer ${env:API_TOKEN}"
+----
+
+**2. Input Validation**
+
+* Validate file sizes
+* Check file types
+* Scan for malware
+
+**3. Rate Limiting**
+
+* Implement request throttling
+* Add queue management
+
+**4. Data Privacy**
+
+* Encrypt sensitive documents
+* Secure API endpoints with TLS
+* Implement access logging
+
+== Production Deployment
+
+=== Using Kubernetes
+
+[source,yaml]
+----
+# See k8s-deployment.yaml (example)
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: docling-langchain4j-rag
+spec:
+  replicas: 3
+  ...
+----
+
+=== Scaling Considerations
+
+* **Horizontal**: Multiple Camel instances with load balancer
+* **Vertical**: Increase memory/CPU for Ollama container
+* **Caching**: Cache frequent document conversions
+
+== Cleanup
+
+Stop all services:
+
+[source,sh]
+----
+# Docker Compose
+$ docker compose down
+
+# Or manual cleanup
+$ docker stop docling-serve ollama
+$ docker rm docling-serve ollama
+----
+
+Remove volumes (optional):
+
+[source,sh]
+----
+$ docker volume rm docling-langchain4j-rag_ollama_data
+----
+
+== Alternative Configurations
+
+=== Using OpenAI Instead of Ollama
+
+[source,properties]
+----
+# application.properties
+openai.api.key=sk-your-api-key-here
+----
+
+[source,yaml]
+----
+# Update bean configuration
+- name: chatModel
+  type: dev.langchain4j.model.chat.ChatLanguageModel
+  scriptLanguage: groovy
+  script: |
+    import dev.langchain4j.model.openai.OpenAiChatModel
+
+    return OpenAiChatModel.builder()
+      .apiKey(context.resolvePropertyPlaceholders("{{openai.api.key}}"))
+      .modelName("gpt-4")
+      .temperature(0.3)
+      .build()
+----
+
+=== Using Cloud Docling Service
+
+If you have a cloud-hosted Docling service:
+
+[source,properties]
+----
+docling.serve.url=https://your-docling-service.com
+docling.auth.token=your-auth-token
+----
+
+== References
+
+* **Docling**: https://github.com/DS4SD/docling
+* **LangChain4j**: https://github.com/langchain4j/langchain4j
+* **Ollama**: https://ollama.ai
+* **Apache Camel**: https://camel.apache.org
+* **Camel Docling Component**: 
/home/oscerd/workspace/apache-camel/camel/components/camel-ai/camel-docling/
+* **Camel LangChain4j Components**: 
/home/oscerd/workspace/apache-camel/camel/components/camel-ai/
+
+== Help and Contributions
+
+If you hit any problem using Camel or have some feedback, then please
+https://camel.apache.org/community/support/[let us know].
+
+We also love contributors, so
+https://camel.apache.org/community/contributing/[get involved] :-)
+
+The Camel riders!
diff --git a/docling-langchain4j-rag/application.properties 
b/docling-langchain4j-rag/application.properties
new file mode 100644
index 0000000..1c416dd
--- /dev/null
+++ b/docling-langchain4j-rag/application.properties
@@ -0,0 +1,39 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Application Configuration
+camel.main.name = DoclingLangChain4jRAG
+
+# Directory Configuration
+documents.directory=documents
+output.directory=output
+
+# Docling-Serve Configuration
+# Can be run via: docker run -p 5001:5001 
ghcr.io/docling-project/docling-serve:latest
+# Or via: camel infra run docling (if available)
+docling.serve.url=http://localhost:5001
+
+# Ollama Configuration
+ollama.base.url=http://localhost:11434
+ollama.model.name=orca-mini
+
+# Batch Processing Configuration
+# Set to -1 to disable batch processing
+batch.delay=10000
+
+# HTTP Server Configuration
+camel.server.port=8080
diff --git a/docling-langchain4j-rag/compose.yaml 
b/docling-langchain4j-rag/compose.yaml
new file mode 100644
index 0000000..56f4d91
--- /dev/null
+++ b/docling-langchain4j-rag/compose.yaml
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+services:
+  docling-serve:
+    image: ghcr.io/docling-project/docling-serve:latest
+    container_name: docling-serve
+    ports:
+      - "5001:5001"
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:5001/";]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  ollama:
+    image: ollama/ollama:latest
+    container_name: ollama-rag
+    ports:
+      - "11435:11434"
+    volumes:
+      - ollama_data:/root/.ollama
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:11434/";]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+volumes:
+  ollama_data:
+    driver: local
diff --git a/docling-langchain4j-rag/docling-langchain4j-rag.yaml 
b/docling-langchain4j-rag/docling-langchain4j-rag.yaml
new file mode 100644
index 0000000..8004956
--- /dev/null
+++ b/docling-langchain4j-rag/docling-langchain4j-rag.yaml
@@ -0,0 +1,378 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Document Processing with Docling and AI Analysis with LangChain4j
+# This example demonstrates RAG (Retrieval Augmented Generation) using:
+# - Docling for document conversion (PDF, Word, etc. to Markdown)
+# - LangChain4j with Ollama for AI-powered document analysis
+
+# Bean Definitions
+- beans:
+  # Configure Ollama Chat Model
+  - name: chatModel
+    type: "#class:dev.langchain4j.model.ollama.OllamaChatModel"
+    scriptLanguage: groovy
+    script: |
+      import dev.langchain4j.model.ollama.OllamaChatModel
+      import static java.time.Duration.ofSeconds
+
+      return OllamaChatModel.builder()
+        .baseUrl("{{ollama.base.url}}")
+        .modelName("{{ollama.model.name}}")
+        .temperature(0.3)
+        .timeout(ofSeconds(120))
+        .build()
+
+# Route Definitions
+
+# Route 1: Main RAG workflow - Convert document and analyze with AI
+- route:
+    id: document-analysis-workflow
+    from:
+      uri: file:{{documents.directory}}
+      parameters:
+        include: ".*\\.(pdf|docx|pptx|html|md)"
+        noop: true
+        idempotent: true
+      steps:
+        - log: "Processing document: ${header.CamelFileName}"
+        - setProperty:
+            name: originalFileName
+            simple: "${header.CamelFileName}"
+
+        # Convert GenericFile to file path
+        - setBody:
+            simple: "${body.file.absolutePath}"
+
+        # Step 1: Convert document to Markdown using Docling
+        - log: "Converting document to Markdown with Docling..."
+        - to:
+            uri: docling:CONVERT_TO_MARKDOWN
+            parameters:
+              useDoclingServe: true
+              doclingServeUrl: "{{docling.serve.url}}"
+              contentInBody: true
+        - log: "Document converted to Markdown successfully"
+
+        # Save the file path for cleanup
+        - setProperty:
+            name: sourceFilePath
+            simple: "${exchangeProperty.originalFileName}"
+
+      # Step 2: Store converted content
+        - setProperty:
+            name: convertedMarkdown
+            simple: "${body}"
+
+      # Step 3: Log the converted content (first 500 chars)
+        - script:
+            groovy: |
+              def markdown = exchange.getProperty("convertedMarkdown", 
String.class)
+              def preview = markdown.length() > 500 ? markdown.substring(0, 
500) + "..." : markdown
+              log.info("Converted Markdown preview:\n{}", preview)
+
+      # Step 4: Prepare AI prompt for document analysis
+        - setBody:
+            simple: |
+              You are a helpful document analysis assistant. Please analyze 
the following document and provide:
+              1. A brief summary (2-3 sentences)
+              2. Key topics and main points
+              3. Any important findings or conclusions
+
+              Document content:
+              ${exchangeProperty.convertedMarkdown}
+
+      # Step 5: Send to LangChain4j Chat for AI analysis
+        - log: "Analyzing document with AI model..."
+        - to:
+            uri: langchain4j-chat:analysis
+            parameters:
+              chatModel: "#chatModel"
+
+      # Step 6: Store AI analysis result
+        - setProperty:
+            name: aiAnalysis
+            simple: "${body}"
+        - log: "AI analysis completed"
+
+      # Step 7: Create combined result (markdown + analysis)
+        - script:
+            groovy: |
+              def fileName = exchange.getProperty("originalFileName")
+              def markdown = exchange.getProperty("convertedMarkdown", 
String.class)
+              def analysis = exchange.getProperty("aiAnalysis", String.class)
+              def dateStr = new java.text.SimpleDateFormat("yyyy-MM-dd 
HH:mm:ss").format(new java.util.Date())
+
+              def result = "# Document Analysis Report\n\n" +
+                "**File:** ${fileName}\n" +
+                "**Date:** ${dateStr}\n\n" +
+                "---\n\n" +
+                "## AI Analysis\n\n" +
+                "${analysis}\n\n" +
+                "---\n\n" +
+                "## Full Document Content (Markdown)\n\n" +
+                "${markdown}"
+
+              exchange.message.setBody(result)
+
+      # Step 8: Save combined result
+        - setHeader:
+            name: CamelFileName
+            simple: "${exchangeProperty.originalFileName}_analysis.md"
+        - to:
+            uri: file:{{output.directory}}
+        - log: "Analysis report saved: ${header.CamelFileName}"
+
+        # Step 9: Clean up - delete the processed file from documents directory
+        - script:
+            groovy: |
+              import java.nio.file.Files
+              import java.nio.file.Paths
+
+              def docDir = 
camelContext.resolvePropertyPlaceholders("{{documents.directory}}")
+              def fileName = exchange.getProperty("sourceFilePath")
+              def filePath = Paths.get(docDir, fileName)
+
+              if (Files.exists(filePath)) {
+                Files.delete(filePath)
+                log.info("Cleaned up source file: {}", filePath)
+              }
+        - log: "Processing complete for: ${exchangeProperty.originalFileName}"
+
+# Route 2: Interactive Q&A about documents
+- route:
+    id: document-qa-api
+    from:
+      uri: platform-http:/api/ask
+      steps:
+        - log: "Received question: ${body}"
+        - setProperty:
+            name: userQuestion
+            simple: "${body}"
+
+      # Read the most recent document from documents directory
+        - script:
+            groovy: |
+              import java.nio.file.Files
+              import java.nio.file.Paths
+              import java.util.stream.Collectors
+
+              def docDir = 
camelContext.resolvePropertyPlaceholders("{{documents.directory}}")
+              def docPath = Paths.get(docDir)
+
+              if (Files.exists(docPath)) {
+                def latestFile = Files.list(docPath)
+                  .filter { f -> 
f.toString().matches(".*\\.(pdf|docx|pptx|html|md)") }
+                  .max { f1, f2 -> 
Files.getLastModifiedTime(f1).compareTo(Files.getLastModifiedTime(f2)) }
+                  .orElse(null)
+
+                if (latestFile != null) {
+                  exchange.message.setBody(latestFile.toFile())
+                  exchange.setProperty("documentFound", true)
+                } else {
+                  exchange.setProperty("documentFound", false)
+                  exchange.message.setBody("No documents found in directory")
+                }
+              } else {
+                exchange.setProperty("documentFound", false)
+                exchange.message.setBody("Documents directory does not exist")
+              }
+
+      # Convert document to markdown if found
+        - choice:
+            when:
+              - simple: "${exchangeProperty.documentFound} == true"
+                steps:
+                  - log: "Converting document for Q&A..."
+                  - to:
+                      uri: docling:CONVERT_TO_MARKDOWN
+                      parameters:
+                        useDoclingServe: true
+                        doclingServeUrl: "{{docling.serve.url}}"
+                        contentInBody: true
+
+                  - setProperty:
+                      name: documentContent
+                      simple: "${body}"
+
+                  # Prepare RAG prompt
+                  - setBody:
+                      simple: |
+                        You are a helpful assistant answering questions about 
documents.
+
+                        Document content:
+                        ${exchangeProperty.documentContent}
+
+                        Question: ${exchangeProperty.userQuestion}
+
+                        Please provide a clear and concise answer based on the 
document content above.
+
+                  - to:
+                      uri: langchain4j-chat:qa
+                      parameters:
+                        chatModel: "#chatModel"
+
+                  - setHeader:
+                      name: Content-Type
+                      constant: "text/plain"
+            otherwise:
+              steps:
+                - setBody:
+                    simple: "Error: ${body}"
+                - setHeader:
+                    name: Content-Type
+                    constant: "text/plain"
+
+# Route 3: Batch document summarization
+- route:
+    id: batch-summarization
+    from:
+      uri: timer:batchSummarize
+      parameters:
+        delay: "{{batch.delay}}"
+        repeatCount: 0
+      steps:
+        - log: "Starting batch document summarization..."
+        - script:
+            groovy: |
+              import java.nio.file.Files
+              import java.nio.file.Paths
+
+              def docDir = 
camelContext.resolvePropertyPlaceholders("{{documents.directory}}")
+              def docPath = Paths.get(docDir)
+
+              if (Files.exists(docPath)) {
+                def files = Files.list(docPath)
+                  .filter { f -> 
f.toString().matches(".*\\.(pdf|docx|pptx|html|md)") }
+                  .collect { it.toFile() }
+
+                exchange.message.setBody(files)
+              } else {
+                exchange.message.setBody([])
+              }
+
+        - split:
+            simple: "${body}"
+            steps:
+              - log: "Summarizing: ${body}"
+              - setProperty:
+                  name: currentFile
+                  simple: "${body}"
+
+              # Convert to markdown
+              - to:
+                  uri: docling:CONVERT_TO_MARKDOWN
+                  parameters:
+                    useDoclingServe: true
+                    doclingServeUrl: "{{docling.serve.url}}"
+                    contentInBody: true
+
+              # Generate summary
+              - setBody:
+                  simple: |
+                    Please provide a concise 3-sentence summary of the 
following document:
+
+                    ${body}
+
+              - to:
+                  uri: langchain4j-chat:summary
+                  parameters:
+                    chatModel: "#chatModel"
+
+              - log: "Summary: ${body}"
+
+# Route 4: Health check endpoint
+- route:
+    id: health-check
+    from:
+      uri: platform-http:/api/health
+      steps:
+        - log: "Health check requested"
+        - script:
+            groovy: |
+              import groovy.json.JsonOutput
+
+              def doclingUrl = 
camelContext.resolvePropertyPlaceholders("{{docling.serve.url}}")
+              def ollamaUrl = 
camelContext.resolvePropertyPlaceholders("{{ollama.base.url}}")
+
+              def health = [
+                status: "healthy",
+                components: [
+                  docling: [
+                    url: doclingUrl,
+                    status: "configured"
+                  ],
+                  ollama: [
+                    url: ollamaUrl,
+                    model: 
camelContext.resolvePropertyPlaceholders("{{ollama.model.name}}"),
+                    status: "configured"
+                  ]
+                ],
+                directories: [
+                  documents: 
camelContext.resolvePropertyPlaceholders("{{documents.directory}}"),
+                  output: 
camelContext.resolvePropertyPlaceholders("{{output.directory}}")
+                ]
+              ]
+
+              exchange.message.setBody(JsonOutput.toJson(health))
+        - setHeader:
+            name: Content-Type
+            constant: "application/json"
+
+# Route 5: Extract structured data from documents
+- route:
+    id: extract-structured-data
+    from:
+      uri: platform-http:/api/extract
+      parameters:
+        httpMethodRestrict: "POST"
+      steps:
+        - log: "Extracting structured data from uploaded document"
+        - setProperty:
+            name: uploadedContent
+            simple: "${body}"
+
+      # Extract as JSON with Docling
+        - to:
+            uri: docling:EXTRACT_STRUCTURED_DATA
+            parameters:
+              useDoclingServe: true
+              doclingServeUrl: "{{docling.serve.url}}"
+              outputFormat: "json"
+              contentInBody: true
+
+        - setProperty:
+            name: structuredData
+            simple: "${body}"
+
+      # Ask AI to analyze the structured data
+        - setBody:
+            simple: |
+              Please analyze this structured document data and identify:
+              1. Document type and structure
+              2. Key data fields and their values
+              3. Any tables or structured information
+
+              Structured data:
+              ${exchangeProperty.structuredData}
+
+        - to:
+            uri: langchain4j-chat:extract
+            parameters:
+              chatModel: "#chatModel"
+
+        - setHeader:
+            name: Content-Type
+            constant: "text/plain"
diff --git a/docling-langchain4j-rag/run.sh b/docling-langchain4j-rag/run.sh
new file mode 100755
index 0000000..1c43df8
--- /dev/null
+++ b/docling-langchain4j-rag/run.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Run the Docling + LangChain4j RAG example
+
+jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel run \
+  --fresh \
+  --dep=camel:docling \
+  --dep=camel:langchain4j-chat \
+  --dep=camel:platform-http \
+  --dep=dev.langchain4j:langchain4j:1.6.0 \
+  --dep=dev.langchain4j:langchain4j-ollama:1.6.0 \
+  --properties=application.properties \
+  docling-langchain4j-rag.yaml
diff --git a/docling-langchain4j-rag/sample.md 
b/docling-langchain4j-rag/sample.md
new file mode 100644
index 0000000..5ad4111
--- /dev/null
+++ b/docling-langchain4j-rag/sample.md
@@ -0,0 +1,65 @@
+# Sample Document for RAG Analysis
+
+## Introduction to Apache Camel
+
+Apache Camel is an open-source integration framework based on known Enterprise 
Integration Patterns (EIPs). It provides a rule-based routing and mediation 
engine that allows developers to define routing and mediation rules in various 
domain-specific languages.
+
+## Key Features
+
+### 1. Routing and Mediation Engine
+
+Camel supports routing and mediation rules in various DSLs including:
+
+- Java DSL
+- XML Configuration
+- YAML DSL
+- Groovy DSL
+
+### 2. Extensive Component Library
+
+Camel provides over 300 components for integrating with:
+
+- Messaging systems (JMS, Kafka, AMQP)
+- Databases (JDBC, MongoDB, Cassandra)
+- Cloud services (AWS, Azure, Google Cloud)
+- APIs (REST, SOAP, GraphQL)
+
+### 3. Enterprise Integration Patterns
+
+Camel implements all EIPs from the famous book by Gregor Hohpe and Bobby Woolf:
+
+- Content-Based Router
+- Message Filter
+- Splitter and Aggregator
+- Dead Letter Channel
+- Wire Tap
+
+## AI Integration
+
+Camel now includes AI components for modern integration needs:
+
+### LangChain4j Components
+
+- **langchain4j-chat**: Integrate with Large Language Models
+- **langchain4j-embeddings**: Generate vector embeddings
+- **langchain4j-tools**: Create AI tools and agents
+
+### Docling Component
+
+The Docling component enables document processing:
+
+- Convert PDF, Word, PowerPoint to Markdown
+- Extract structured data from documents
+- Support for OCR and table extraction
+- Integration with AI models for document analysis
+
+## Use Cases
+
+1. **Document Processing Pipeline**: Convert documents and analyze with AI
+2. **RAG Systems**: Retrieval Augmented Generation with vector stores
+3. **Intelligent Routing**: Use LLMs to make routing decisions
+4. **Data Extraction**: Extract and transform unstructured data
+
+## Conclusion
+
+Apache Camel continues to evolve, now bridging traditional integration 
patterns with modern AI capabilities, making it an ideal choice for building 
intelligent integration solutions.

(camel-jbang-examples) 01/01: Added an example of usage of docling-serve, ollama and langchain4j

Reply via email to