This is an automated email from the ASF dual-hosted git repository. ndipiazza pushed a commit to branch gpg-signed-release-support in repository https://gitbox.apache.org/repos/asf/tika-grpc-docker.git
commit 28643fa8dbf071a3eaba2357a01b6259c282d97e Author: Nicholas DiPiazza <[email protected]> AuthorDate: Fri Dec 19 09:04:21 2025 -0600 Add support for building from development branches with Ignite ConfigStore - Created build-from-branch.sh script to build Docker images from Git branches - Added Dockerfile.ignite for building with Ignite ConfigStore support - Added sample Ignite configuration and documentation - Updated main README with build-from-branch instructions This enables testing development features (like TIKA-4583 Ignite ConfigStore) before they are officially released, without needing to modify the main Tika repository per PR #2462. Usage: ./build-from-branch.sh -b TIKA-4583-ignite-config-store -i Features: - Builds from any Git branch or tag - Optional Ignite ConfigStore plugin inclusion - Supports custom repositories (forks) - Automatic testing after build - Optional push to registry Related to: apache/tika#2462, TIKA-4583 --- README.md | 48 +++++++ build-from-branch.sh | 191 ++++++++++++++++++++++++++ full/Dockerfile.ignite | 86 ++++++++++++ sample-configs/ignite/README.md | 117 ++++++++++++++++ sample-configs/ignite/tika-config-ignite.json | 29 ++++ 5 files changed, 471 insertions(+) diff --git a/README.md b/README.md index 5097817..8c1f12a 100644 --- a/README.md +++ b/README.md @@ -114,6 +114,54 @@ For our current release process, visit [tika-docker Release Process](https://cwi ## Authors Apache Tika Dev Team ([email protected]) + +## Building from Development Branches + +For testing unreleased features or development branches, you can build Docker images directly from source: + +### Building with Ignite ConfigStore Support + +```bash +./build-from-branch.sh -b TIKA-4583-ignite-config-store -i -t ignite-test +``` + +This will: +1. Clone the specified branch from the Apache Tika repository +2. Build tika-grpc and tika-ignite-config-store +3. Create a Docker image with both components +4. Run basic tests to verify the image works + +### Build Script Options + +```bash +./build-from-branch.sh [OPTIONS] + +Options: + -b BRANCH Git branch or tag to build from (default: main) + -r REPO Git repository URL (default: https://github.com/apache/tika.git) + -t TAG Docker image tag (default: branch-name) + -i Include Ignite ConfigStore plugin + -p Push to Docker registry after building + -h Display help message +``` + +### Examples + +Build from a specific branch: +```bash +./build-from-branch.sh -b TIKA-4583-ignite-config-store -i +``` + +Build from a fork and push: +```bash +./build-from-branch.sh \ + -r https://github.com/yourusername/tika.git \ + -b my-feature \ + -t myregistry/tika-grpc:my-feature \ + -p +``` + +See [sample-configs/ignite/README.md](sample-configs/ignite/README.md) for detailed instructions on running clustered deployments with Ignite. ## Contributors diff --git a/build-from-branch.sh b/build-from-branch.sh new file mode 100755 index 0000000..a997f72 --- /dev/null +++ b/build-from-branch.sh @@ -0,0 +1,191 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Build Docker image from a specific Tika branch +# This is useful for testing development branches before they are released + +die() { + echo "$*" >&2 + exit 1 +} + +print_usage() { + echo "Usage: $0 [OPTIONS]" + echo "" + echo "Build Apache Tika gRPC Docker image from source branch" + echo "" + echo "Options:" + echo " -b BRANCH Git branch or tag to build from (default: main)" + echo " -r REPO Git repository URL (default: https://github.com/apache/tika.git)" + echo " -t TAG Docker image tag (default: branch-name)" + echo " -i Include Ignite ConfigStore plugin" + echo " -p Push to Docker registry after building" + echo " -h Display this help message" + echo "" + echo "Examples:" + echo " # Build from TIKA-4583 branch with Ignite support" + echo " $0 -b TIKA-4583-ignite-config-store -i" + echo "" + echo " # Build from fork and push to registry" + echo " $0 -r https://github.com/user/tika.git -b feature-branch -t myimage:latest -p" +} + +# Default values +BRANCH="main" +REPO="https://github.com/apache/tika.git" +TAG="" +INCLUDE_IGNITE=false +PUSH=false + +# Parse command line arguments +while getopts ":b:r:t:iph" opt; do + case ${opt} in + b ) + BRANCH=$OPTARG + ;; + r ) + REPO=$OPTARG + ;; + t ) + TAG=$OPTARG + ;; + i ) + INCLUDE_IGNITE=true + ;; + p ) + PUSH=true + ;; + h ) + print_usage + exit 0 + ;; + \? ) + echo "Invalid Option: -$OPTARG" 1>&2 + print_usage + exit 1 + ;; + : ) + echo "Invalid Option: -$OPTARG requires an argument" 1>&2 + print_usage + exit 1 + ;; + esac +done +shift $((OPTIND -1)) + +# Set default tag if not specified +if [ -z "$TAG" ]; then + # Convert branch name to valid Docker tag + TAG=$(echo "$BRANCH" | sed 's/[^a-zA-Z0-9._-]/-/g' | tr '[:upper:]' '[:lower:]') +fi + +echo "=====================================================================================================" +echo "Building Apache Tika gRPC Docker Image" +echo "=====================================================================================================" +echo "Repository: $REPO" +echo "Branch: $BRANCH" +echo "Tag: apache/tika-grpc:$TAG" +echo "Ignite: $INCLUDE_IGNITE" +echo "Push: $PUSH" +echo "=====================================================================================================" + +# Choose Dockerfile based on Ignite flag +if [ "$INCLUDE_IGNITE" = true ]; then + DOCKERFILE="full/Dockerfile.ignite" + echo "Using Dockerfile with Ignite ConfigStore support: $DOCKERFILE" +else + DOCKERFILE="full/Dockerfile" + echo "Using standard Dockerfile: $DOCKERFILE" +fi + +# Check if Dockerfile exists +if [ ! -f "$DOCKERFILE" ]; then + die "Error: Dockerfile not found at $DOCKERFILE" +fi + +# Build the image +echo "" +echo "Building Docker image..." +docker build \ + --build-arg TIKA_BRANCH="$BRANCH" \ + --build-arg GIT_REPO="$REPO" \ + -t "apache/tika-grpc:$TAG" \ + -f "$DOCKERFILE" \ + . || die "Docker build failed" + +echo "" +echo "=====================================================================================================" +echo "Build complete: apache/tika-grpc:$TAG" +echo "=====================================================================================================" + +# Test the image +echo "" +echo "Testing the image..." +CONTAINER_NAME="tika-test-$$" +docker run -d --name "$CONTAINER_NAME" -p 127.0.0.1:50052:50052 "apache/tika-grpc:$TAG" || die "Failed to start container" + +# Wait for container to start +echo "Waiting for container to start..." +sleep 10 + +# Check if container is running +if docker ps | grep -q "$CONTAINER_NAME"; then + echo "$(tput setaf 2)✓ Container started successfully$(tput sgr0)" +else + echo "$(tput setaf 1)✗ Container failed to start$(tput sgr0)" + docker logs "$CONTAINER_NAME" + docker rm -f "$CONTAINER_NAME" 2>/dev/null + exit 1 +fi + +# Verify user +USER=$(docker inspect "$CONTAINER_NAME" --format '{{.Config.User}}') +if [ "$USER" = "35002:35002" ]; then + echo "$(tput setaf 2)✓ User configuration correct: $USER$(tput sgr0)" +else + echo "$(tput setaf 1)✗ User configuration incorrect: $USER (expected 35002:35002)$(tput sgr0)" + docker rm -f "$CONTAINER_NAME" 2>/dev/null + exit 1 +fi + +# Clean up test container +docker rm -f "$CONTAINER_NAME" >/dev/null 2>&1 +echo "$(tput setaf 2)✓ Tests passed$(tput sgr0)" + +# Push if requested +if [ "$PUSH" = true ]; then + echo "" + echo "Pushing image to registry..." + docker push "apache/tika-grpc:$TAG" || die "Failed to push image" + echo "$(tput setaf 2)✓ Image pushed successfully$(tput sgr0)" +fi + +echo "" +echo "=====================================================================================================" +echo "Done! Image ready: apache/tika-grpc:$TAG" +echo "=====================================================================================================" +echo "" +echo "To run the container:" +echo " docker run -p 50052:50052 apache/tika-grpc:$TAG" +echo "" +if [ "$INCLUDE_IGNITE" = true ]; then + echo "To run with Ignite configuration:" + echo " docker run -p 50052:50052 -v \$(pwd)/tika-config.json:/config/tika-config.json apache/tika-grpc:$TAG -c /config/tika-config.json" + echo "" +fi diff --git a/full/Dockerfile.ignite b/full/Dockerfile.ignite new file mode 100644 index 0000000..d731d70 --- /dev/null +++ b/full/Dockerfile.ignite @@ -0,0 +1,86 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); you may not +# use this file except in compliance with the License. You may obtain a copy of +# the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +# License for the specific language governing permissions and limitations under +# the License. + +# Multi-stage build for tika-grpc with Ignite ConfigStore support +# This Dockerfile builds tika-grpc from source, including the ignite-config-store plugin + +ARG UID_GID="35002:35002" + +# Stage 1: Build Tika from source +FROM maven:3.9-eclipse-temurin-17 AS builder + +ARG TIKA_BRANCH=main +ARG GIT_REPO=https://github.com/apache/tika.git + +WORKDIR /build + +# Clone the repository and checkout the specified branch +RUN apt-get update && apt-get install -y git && \ + git clone --depth 1 --branch ${TIKA_BRANCH} ${GIT_REPO} tika && \ + cd tika && \ + git log -1 --format="%H %s" + +# Build tika-grpc and tika-ignite-config-store +WORKDIR /build/tika +RUN mvn clean install -DskipTests -pl tika-grpc -am && \ + mvn clean install -DskipTests -pl tika-pipes/tika-ignite-config-store -am + +# Extract the built artifacts +RUN mkdir -p /artifacts && \ + cp tika-grpc/target/tika-grpc-*.jar /artifacts/tika-grpc.jar && \ + cp tika-pipes/tika-ignite-config-store/target/tika-ignite-config-store-*.jar /artifacts/tika-ignite-config-store.jar + +# Stage 2: Runtime image with full dependencies +FROM ubuntu:noble AS runtime + +ARG UID_GID +ARG JRE='openjdk-17-jre-headless' + +RUN set -eux \ + && apt-get update \ + && apt-get install --yes --no-install-recommends gnupg2 software-properties-common \ + && apt-get update \ + && DEBIAN_FRONTEND=noninteractive apt-get install --yes --no-install-recommends $JRE \ + gdal-bin \ + tesseract-ocr \ + tesseract-ocr-eng \ + tesseract-ocr-ita \ + tesseract-ocr-fra \ + tesseract-ocr-spa \ + tesseract-ocr-deu \ + && echo ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula select true | debconf-set-selections \ + && DEBIAN_FRONTEND=noninteractive apt-get install --yes --no-install-recommends \ + xfonts-utils \ + fonts-freefont-ttf \ + fonts-liberation \ + ttf-mscorefonts-installer \ + wget \ + cabextract \ + && apt-get clean -y \ + && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* + +# Create directory for plugins +RUN mkdir -p /tika-extras + +# Copy the built artifacts from builder +COPY --from=builder /artifacts/tika-grpc.jar /tika-grpc.jar +COPY --from=builder /artifacts/tika-ignite-config-store.jar /tika-extras/tika-ignite-config-store.jar + +USER $UID_GID + +EXPOSE 50052 + +# Include the plugin in the classpath +ENTRYPOINT [ "/bin/sh", "-c", "exec java -cp \"/tika-grpc.jar:/tika-extras/*\" org.apache.tika.pipes.grpc.TikaGrpcServer $0 $@"] + +LABEL maintainer="Apache Tika Developers [email protected]" +LABEL description="Apache Tika gRPC Server with Ignite ConfigStore support" diff --git a/sample-configs/ignite/README.md b/sample-configs/ignite/README.md new file mode 100644 index 0000000..9530537 --- /dev/null +++ b/sample-configs/ignite/README.md @@ -0,0 +1,117 @@ +# Apache Ignite ConfigStore Configuration + +This directory contains sample configurations for running tika-grpc with Apache Ignite distributed configuration storage. + +## Building the Image + +To build a Docker image from the TIKA-4583 branch with Ignite support: + +```bash +./build-from-branch.sh -b TIKA-4583-ignite-config-store -i -t ignite-test +``` + +## Running Standalone + +Run a single instance with Ignite (useful for testing): + +```bash +docker run -p 50052:50052 \ + -v $(pwd)/sample-configs/ignite/tika-config-ignite.json:/config/tika-config.json \ + apache/tika-grpc:ignite-test \ + -c /config/tika-config.json +``` + +## Running in Docker Compose (Clustered) + +Create a `docker-compose.yml`: + +```yaml +version: '3.8' + +services: + tika-grpc-1: + image: apache/tika-grpc:ignite-test + ports: + - "50052:50052" + volumes: + - ./sample-configs/ignite/tika-config-ignite.json:/config/tika-config.json + command: ["-c", "/config/tika-config.json"] + networks: + - tika-cluster + + tika-grpc-2: + image: apache/tika-grpc:ignite-test + ports: + - "50053:50052" + volumes: + - ./sample-configs/ignite/tika-config-ignite.json:/config/tika-config.json + command: ["-c", "/config/tika-config.json"] + networks: + - tika-cluster + + tika-grpc-3: + image: apache/tika-grpc:ignite-test + ports: + - "50054:50052" + volumes: + - ./sample-configs/ignite/tika-config-ignite.json:/config/tika-config.json + command: ["-c", "/config/tika-config.json"] + networks: + - tika-cluster + +networks: + tika-cluster: + driver: bridge +``` + +Start the cluster: + +```bash +docker-compose up +``` + +## Verifying Cluster Formation + +Check the logs to verify Ignite cluster formation: + +```bash +docker-compose logs | grep "Topology snapshot" +``` + +You should see output like: +``` +Topology snapshot [ver=3, servers=3, clients=0, ...] +``` + +## Testing Configuration Sharing + +1. Create a fetcher on one server: +```bash +# Add fetcher to server 1 (port 50052) +grpcurl -d '{"fetcher_config": "{\"id\":\"shared-fetcher\",\"name\":\"file-system\",\"params\":{\"basePath\":\"/data\"}}"}' \ + -plaintext localhost:50052 tika.Tika/SaveFetcher +``` + +2. Retrieve it from another server: +```bash +# Get fetcher from server 2 (port 50053) +grpcurl -d '{"fetcher_id": "shared-fetcher"}' \ + -plaintext localhost:50053 tika.Tika/GetFetcher +``` + +The fetcher should be available on all servers in the cluster! + +## Configuration Options + +Edit `tika-config-ignite.json` to customize: + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `cacheName` | Name of the Ignite cache | `tika-config-store` | +| `cacheMode` | Cache mode (REPLICATED or PARTITIONED) | `REPLICATED` | +| `igniteInstanceName` | Ignite instance name | `TikaIgniteCluster` | +| `autoClose` | Auto-close Ignite on shutdown | `true` | + +## Kubernetes Deployment + +See the main [Ignite ConfigStore README](https://github.com/apache/tika/tree/TIKA-4583-ignite-config-store/tika-pipes/tika-ignite-config-store#kubernetes-deployment) for comprehensive Kubernetes deployment instructions. diff --git a/sample-configs/ignite/tika-config-ignite.json b/sample-configs/ignite/tika-config-ignite.json new file mode 100644 index 0000000..fe1fcde --- /dev/null +++ b/sample-configs/ignite/tika-config-ignite.json @@ -0,0 +1,29 @@ +{ + "pipes": { + "configStoreType": "ignite", + "configStoreParams": { + "cacheName": "tika-config-store", + "cacheMode": "REPLICATED", + "igniteInstanceName": "TikaIgniteCluster", + "autoClose": true + } + }, + "fetchers": [ + { + "id": "fs", + "name": "file-system", + "params": { + "basePath": "/data/input" + } + } + ], + "emitters": [ + { + "id": "fs", + "name": "file-system", + "params": { + "basePath": "/data/output" + } + } + ] +}
