[
https://issues.apache.org/jira/browse/TIKA-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046018#comment-18046018
]
ASF GitHub Bot commented on TIKA-4578:
--------------------------------------
Copilot commented on code in PR #2462:
URL: https://github.com/apache/tika/pull/2462#discussion_r2628822570
##########
tika-grpc/docker-build/README.md:
##########
@@ -0,0 +1,191 @@
+# Tika gRPC Docker Build
+
+This directory contains the Docker build configuration for Apache Tika gRPC
server.
+
+## Overview
+
+The Docker image includes:
+- Tika gRPC server JAR
+- All Tika Pipes plugins (fetchers, emitters, iterators)
+- Parser packages (standard, extended, ML)
+- OCR support (Tesseract with multiple languages)
+- GDAL for geospatial formats
+- Common fonts
+
+## Building the Docker Image
+
+### Prerequisites
+
+1. Build Tika from the project root (this builds all modules including
plugins):
+```bash
+cd <tika-root>
+mvn clean install -DskipTests
+```
+
+### Build Activation
+
+The Docker build can be activated in two ways:
+
+**Option 1: Using environment variables (recommended)**
+- Set `DOCKER_ID`, `AWS_ACCOUNT_ID`, or `AZURE_REGISTRY_NAME`
+- Maven profiles automatically detect these and enable the build
+- No need for `-Dskip.docker.build=false`
+
+**Option 2: Using Maven property**
+- Add `-Dskip.docker.build=false` to your Maven command
+- Use when you want explicit control or testing
+
+### Building from Tika Root
+
+**Build tika-grpc and dependencies only:**
+```bash
+DOCKER_ID=myusername \
+ mvn clean install -DskipTests -pl :tika-grpc -am
+```
+
+**Build entire project:**
+```bash
+DOCKER_ID=myusername \
+ mvn clean install -DskipTests
+```
+
+### Building from tika-grpc Directory
+
+#### Controlling Docker Build with Environment Variables
+
+All docker-build.sh environment variables are passed through from your shell.
When these variables are set, the Maven profiles automatically activate the
Docker build.
+
+**Build and push to Docker Hub:**
+```bash
+DOCKER_ID=myusername \
+ mvn package
+```
+
+**Build multi-arch and push to Docker Hub:**
+```bash
+MULTI_ARCH=true DOCKER_ID=myusername \
+ mvn package
+```
+
+**Build and push to AWS ECR:**
+```bash
+AWS_ACCOUNT_ID=123456789012 AWS_REGION=us-east-1 \
+ mvn package
+```
+
+**Build and push to Azure Container Registry:**
+```bash
+AZURE_REGISTRY_NAME=myregistry \
+ mvn package
+```
+
+**Note:** When environment variables are set, you don't need
`-Dskip.docker.build=false`. The Maven profiles detect the variables and
automatically enable the build.
+
+### Option 2: Run the Docker Build Script Manually
+
+Set the required environment variable and run the script:
+
+```bash
+export TIKA_VERSION=4.0.0-SNAPSHOT
+./tika-grpc/docker-build/docker-build.sh
+```
+
+### Optional Environment Variables
+
+- `TIKA_VERSION`: Maven project version (required)
+- `RELEASE_IMAGE_TAG`: Override the default tag (defaults to TIKA_VERSION
without -SNAPSHOT)
+- `DOCKER_ID`: Docker Hub username to push to Docker Hub
+- `AWS_ACCOUNT_ID`: AWS account ID to push to ECR
+- `AWS_REGION`: AWS region for ECR (default: us-west-2)
+- `AZURE_REGISTRY_NAME`: Azure Container Registry name
+- `MULTI_ARCH`: Build for multiple architectures (default: false)
+- `PROJECT_NAME`: Docker image name (default: tika-grpc)
+
+### Examples
+
+**Build with Docker Hub using environment variable:**
+```bash
+DOCKER_ID=myusername \
+ mvn package
+```
+
+**Build multi-arch with Docker Hub:**
+```bash
+MULTI_ARCH=true DOCKER_ID=myusername \
+ mvn package
+```
+
+**Build with AWS ECR:**
+```bash
+AWS_ACCOUNT_ID=123456789012 AWS_REGION=us-east-1 \
+ mvn package
+```
+
+**Build with explicit property (for testing/development):**
+```bash
+mvn package -Dskip.docker.build=false -DDOCKER_ID=myusername
Review Comment:
The command example is incorrect. The syntax '-DDOCKER_ID=myusername' won't
work because Maven properties cannot set environment variables. The profiles
are activated by environment variables (env.DOCKER_ID), not Maven properties.
This command should either use 'DOCKER_ID=myusername mvn package
-Dskip.docker.build=false' or just rely on the environment variable without the
property override.
```suggestion
DOCKER_ID=myusername mvn package -Dskip.docker.build=false
```
##########
tika-grpc/docker-build/docker-build.sh:
##########
@@ -0,0 +1,127 @@
+#!/bin/bash
+# This script is intended to be run from Maven exec plugin during the package
phase of maven build
+
+# Check if Docker is installed
+if ! command -v docker &> /dev/null; then
+ echo "ERROR: Docker is not installed or not in PATH. Please install Docker
first."
+ exit 1
+fi
+
+if [ -z "${TIKA_VERSION}" ]; then
+ echo "Environment variable TIKA_VERSION is required, and should match the
maven project version of Tika"
+ exit 1
+fi
+
+SCRIPT_DIR=$( cd
> make changes to add a Dockerfile and build script
> -------------------------------------------------
>
> Key: TIKA-4578
> URL: https://issues.apache.org/jira/browse/TIKA-4578
> Project: Tika
> Issue Type: Sub-task
> Components: build
> Reporter: Nicholas DiPiazza
> Priority: Major
>
> see:
> [https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-grpc/docker-build]
> we need a docker-build file similar to this added to Tika's tika-grpcÂ
--
This message was sent by Atlassian Jira
(v8.20.10#820010)