This is an automated email from the ASF dual-hosted git repository. ssulav pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/ozone-installer.git
commit f2e225c95daa6308e0d1d937f4b374a6380fc37c Author: Soumitra Sulav <[email protected]> AuthorDate: Mon Jan 26 19:29:47 2026 +0530 HDDS-13870: Initial commit with OM, SCM, Recon, S3G support --- .gitignore | 2 + README.md | 150 ++++++ ansible.cfg | 26 + callback_plugins/last_failed.py | 57 ++ inventories/dev/group_vars/all.yml | 41 ++ inventories/dev/hosts.ini | 17 + ozone_installer.py | 656 +++++++++++++++++++++++ playbooks/cluster.yml | 59 ++ requirements.txt | 2 + roles/cleanup/tasks/main.yml | 52 ++ roles/java/defaults/main.yml | 12 + roles/java/tasks/main.yml | 77 +++ roles/ozone_config/defaults/main.yml | 6 + roles/ozone_config/tasks/main.yml | 57 ++ roles/ozone_config/templates/core-site.xml.j2 | 15 + roles/ozone_config/templates/ozone-env.sh.j2 | 53 ++ roles/ozone_config/templates/ozone-hosts.yaml.j2 | 30 ++ roles/ozone_config/templates/ozone-site.xml.j2 | 128 +++++ roles/ozone_config/templates/workers.j2 | 3 + roles/ozone_fetch/defaults/main.yml | 9 + roles/ozone_fetch/tasks/main.yml | 113 ++++ roles/ozone_layout/defaults/main.yml | 3 + roles/ozone_layout/tasks/main.yml | 29 + roles/ozone_service/tasks/main.yml | 100 ++++ roles/ozone_smoke/tasks/main.yml | 91 ++++ roles/ozone_ui/tasks/main.yml | 32 ++ roles/ozone_user/defaults/main.yml | 6 + roles/ozone_user/tasks/main.yml | 33 ++ roles/ssh_bootstrap/defaults/main.yml | 14 + roles/ssh_bootstrap/tasks/main.yml | 72 +++ 30 files changed, 1945 insertions(+) diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..7859f36 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +logs/** +*.pyc \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..7f87e73 --- /dev/null +++ b/README.md @@ -0,0 +1,150 @@ +# Ozone Installer (Ansible) + +## On‑prem quickstart (with ozone-installer) + +This installer automates the on‑prem steps described in the official Ozone docs, including SCM/OM initialization and starting services (SCM, OM, Datanodes, Recon). See the Ozone on‑prem guide for the conceptual background and properties such as `ozone.metadata.dirs`, `ozone.scm.names`, and `ozone.om.address` [Ozone On Premise Installation](https://ozone.apache.org/docs/edge/start/onprem.html). + + +What the installer does (mapped to the on‑prem doc): +- Initializes SCM and OM once, in the correct order, then starts them +- Starts Datanodes on all DN hosts +- Starts Recon on the first Recon host +- Renders `ozone-site.xml` with addresses derived from inventory (SCM names, OM address/service IDs, replication factor based on DN count) + +Ports and service behavior follow Ozone defaults; consult the official documentation for details [Ozone On Premise Installation](https://ozone.apache.org/docs/edge/start/onprem.html). + +## Software Requirements + +- Controller: Python 3.10–3.12 (prefer 3.11) and pip +- Ansible Community 10.x (ansible-core 2.17.x) +- Python packages (installed via `requirements.txt`): + - `ansible-core==2.17.*` + - `click==8.*` (for nicer interactive prompts; optional but recommended) +- SSH prerequisites on controller: + - `sshpass` (only if using password auth with `-m password`) + - Debian/Ubuntu: `sudo apt-get install -y sshpass` + - RHEL/CentOS/Rocky: `sudo yum install -y sshpass` or `sudo dnf install -y sshpass` + - SUSE: `sudo zypper in -y sshpass` + +### Controller node requirements +- Can be local or remote. +- Must be on the same network as the target hosts. +- Requires SSH access (key or password). + +### Run on the controller node +```bash +pip install -r requirements.txt +``` + +## File structure + +- `ansible.cfg` (defaults and logging) +- `playbooks/` (`cluster.yml`) +- `roles/` (ssh_bootstrap, ozone_user, java, ozone_layout, ozone_fetch, ozone_config, ozone_service, ozone_smoke, cleanup, ozone_ui) + +## Usage (two options) + +1) Python wrapper (orchestrates Ansible for you) + +```bash +# Non-HA upstream +python3 ozone_installer.py -H host1.domain -v 2.0.0 + +# HA upstream (3+ hosts) - mode auto-detected +python3 ozone_installer.py -H "host{1..3}.domain" -v 2.0.0 + +# Local snapshot build +python3 ozone_installer.py -H host1 -v local --local-path /path/to/share/ozone-2.1.0-SNAPSHOT + +# Cleanup and reinstall +python3 ozone_installer.py --clean -H "host{1..3}.domain" -v 2.0.0 + +# Notes on cleanup +# - During a normal install, you'll be asked whether to cleanup an existing install (if present). Default is No. +# - Use --clean to cleanup without prompting before reinstall. +``` + +### Interactive prompts and version selection +- The installer uses `click` for interactive prompts when available (TTY). +- Version selection shows a numbered list; you can select by number, type a specific version, or `local`. +- A summary table of inputs is displayed and logged before execution; confirm to proceed. +- Use `--yes` to auto-accept defaults (used implicitly during `--resume`). + +### Resume last failed task + +```bash +# Python wrapper (picks task name from logs/last_failed_task.txt) +python3 ozone_installer.py -H host1.domain -v 2.0.0 --resume +``` + +```bash +# Direct Ansible +ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml \ + --start-at-task "$(head -n1 logs/last_failed_task.txt)" +``` + +2) Direct Ansible (run playbooks yourself) + +```bash +# Non-HA upstream +ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=non-ha" + +# HA upstream +ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=ha" + +# Cleanup only (run just the cleanup role) +ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml \ + --tags cleanup -e "do_cleanup=true" +``` + +## Inventory + +When using the Python wrapper, inventory is built dynamically from `-H/--host` and persisted for reuse at: +- `logs/last_inventory.ini` (groups: `[om]`, `[scm]`, `[datanodes]`, `[recon]` and optional `[s3g]`) +- `logs/last_vars.json` (effective variables passed to the play) + +For direct Ansible runs, you may edit `inventories/dev/hosts.ini` and `inventories/dev/group_vars/all.yml`, or point to `logs/last_inventory.ini` and `logs/last_vars.json` that the wrapper generated. + +## Non-HA + +```bash +ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "cluster_mode=non-ha" +``` + +## HA cluster + +```bash +ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "cluster_mode=ha" +``` + +## Notes + +- Idempotent where possible; runtime `ozone` init/start guarded with `creates:`. +- JAVA_HOME and PATH are persisted for resume; runtime settings are exported via `ozone-env.sh`. +- Local snapshot mode archives from the controller and uploads/extracts on targets using `unarchive`. +- Logs are written to a per-run file under `logs/` named: + - `ansible-<timestamp>-<hosts_raw_sanitized>.log` + - Ansible and the Python wrapper share the same logfile. +- After a successful run, the wrapper prints where to find process logs on target hosts, e.g. `<install base>/current/logs/ozone-<service-user>-<process>-<host>.log`. + +### Directories + +- Install base (`install_base`, default `/opt/ozone`): where Ozone binaries and configs live. A `current` symlink points to the active version directory. +- Data base (`data_base`, default `/data/ozone`): where Ozone writes on‑disk metadata and Datanode data (e.g., `ozone.metadata.dirs`, `hdds.datanode.dir`). + +## Components and config mapping + +- Components (per the Ozone docs): Ozone Manager (OM), Storage Container Manager (SCM), Datanodes (DN), and Recon. The installer maps: + - Non‑HA: first host runs OM+SCM+Recon; all hosts are DNs. + - HA: first three hosts serve as OM and SCM sets; all hosts are DNs; first host is Recon. +- `ozone-site.xml` is rendered from templates based on inventory groups: + - `ozone.scm.names`, `ozone.scm.client.address`, `ozone.om.address` or HA service IDs + - `ozone.metadata.dirs`, `hdds.datanode.dir`, and related paths map to `data_base` + - Replication is set to ONE if DN count < 3, otherwise THREE + +## Optional: S3 Gateway (S3G) and smoke + +- Define a `[s3g]` group in inventory (commonly the first OM host) to enable S3G properties in `ozone-site.xml` (default HTTP port 9878). +- The smoke role can optionally install `awscli` on the first S3G host, configure dummy credentials, and create/list a test bucket against `http://localhost:9878` (for simple functional verification). + + diff --git a/ansible.cfg b/ansible.cfg new file mode 100644 index 0000000..5d7f571 --- /dev/null +++ b/ansible.cfg @@ -0,0 +1,26 @@ +[defaults] +inventory = inventories/dev/hosts.ini +stdout_callback = default +retry_files_enabled = False +gathering = smart +forks = 20 +strategy = free +timeout = 30 +roles_path = roles +log_path = logs/ansible.log +bin_ansible_callbacks = True +callback_plugins = callback_plugins +callbacks_enabled = timer, profile_tasks, last_failed ; for execution time profiling and resume hints +deprecation_warnings = False +host_key_checking = False +remote_tmp = /tmp/.ansible-${USER} + +[privilege_escalation] +become = True +become_method = sudo + +[ssh_connection] +pipelining = True +ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null + + diff --git a/callback_plugins/last_failed.py b/callback_plugins/last_failed.py new file mode 100644 index 0000000..93b07da --- /dev/null +++ b/callback_plugins/last_failed.py @@ -0,0 +1,57 @@ +from __future__ import annotations + +import os +from pathlib import Path +from ansible.plugins.callback import CallbackBase + + +class CallbackModule(CallbackBase): + CALLBACK_VERSION = 2.0 + CALLBACK_TYPE = 'notification' + CALLBACK_NAME = 'last_failed' + CALLBACK_NEEDS_WHITELIST = False + + def __init__(self): + super().__init__() + # Write to installer logs dir + self._out_dir = Path(__file__).resolve().parents[1] / "logs" + self._out_file = self._out_dir / "last_failed_task.txt" + try: + os.makedirs(self._out_dir, exist_ok=True) + except Exception: + pass + + def _write_last_failed(self, result): + try: + task_name = result._task.get_name() # noqa + task_path = getattr(result._task, "get_path", lambda: None)() # noqa + lineno = getattr(result._task, "get_lineno", lambda: None)() # noqa + role_name = None + if task_path and "/roles/" in task_path: + try: + role_segment = task_path.split("/roles/")[1] + role_name = role_segment.split("/")[0] + except Exception: + role_name = None + host = getattr(result, "_host", None) + host_name = getattr(host, "name", "unknown") if host else "unknown" + line = f"{task_name}\n# host: {host_name}\n" + if task_path: + line += f"# file: {task_path}\n" + if lineno: + line += f"# line: {lineno}\n" + if role_name: + line += f"# role: {role_name}\n" + with open(self._out_file, "w", encoding="utf-8") as f: + f.write(line) + except Exception: + # Best effort only; never break the run + pass + + def v2_runner_on_failed(self, result, ignore_errors=False): + self._write_last_failed(result) + + def v2_runner_on_unreachable(self, result): + self._write_last_failed(result) + + diff --git a/inventories/dev/group_vars/all.yml b/inventories/dev/group_vars/all.yml new file mode 100644 index 0000000..608a457 --- /dev/null +++ b/inventories/dev/group_vars/all.yml @@ -0,0 +1,41 @@ +--- +# Global defaults +cluster_mode: "non-ha" # non-ha | ha + +# Source selection +ozone_version: "2.0.0" # "2.0.0" | "local" +dl_url: "https://dlcdn.apache.org/ozone" + +# Local snapshot settings +local_shared_path: "" +local_ozone_dirname: "" + +# Install and data directories +install_base: "/opt/ozone" +data_base: "/data/ozone" + +# Java settings +jdk_major: 17 +ozone_java_home: "" # autodetected if empty + +# Service user/group +service_user: "ozone" +service_group: "ozone" + +# Runtime and behavior +use_sudo: true +start_after_install: true +ozone_opts: "-Xmx1024m -XX:ParallelGCThreads=8" +service_command_timeout: 300 # seconds for service init/start commands +ansible_remote_tmp: "/tmp/.ansible-{{ ansible_user_id }}" + +# SSH bootstrap +allow_cluster_ssh_key_deploy: false +ssh_public_key_path: "" # optional path on controller to a public key to install +ssh_private_key_path: "" # optional path to private key to copy for cluster identity + +# Markers for profile management +JAVA_MARKER: "Apache Ozone Installer Java Home" +ENV_MARKER: "Apache Ozone Installer Env" + + diff --git a/inventories/dev/hosts.ini b/inventories/dev/hosts.ini new file mode 100644 index 0000000..98e7a4b --- /dev/null +++ b/inventories/dev/hosts.ini @@ -0,0 +1,17 @@ +[om] +# om1.example.com + +[scm] +# scm1.example.com + +[datanodes] +# dn1.example.com +# dn2.example.com + +[recon] +# recon1.example.com + +[all:vars] +cluster_mode=non-ha + + diff --git a/ozone_installer.py b/ozone_installer.py new file mode 100755 index 0000000..9246035 --- /dev/null +++ b/ozone_installer.py @@ -0,0 +1,656 @@ +#!/usr/bin/env python3 + +import argparse +import json +import os +import re +import shlex +import subprocess +import sys +import tempfile +import logging +from datetime import datetime +from pathlib import Path +from typing import List, Optional, Tuple + +# Optional nicer interactive prompts (fallback to built-in prompts if unavailable) +try: + import click # type: ignore +except Exception: + click = None # type: ignore + +ANSIBLE_ROOT = Path(__file__).resolve().parent +ANSIBLE_CFG = ANSIBLE_ROOT / "ansible.cfg" +PLAYBOOKS_DIR = ANSIBLE_ROOT / "playbooks" +LOGS_DIR = ANSIBLE_ROOT / "logs" +LAST_FAILED_FILE = LOGS_DIR / "last_failed_task.txt" +LAST_RUN_FILE = LOGS_DIR / "last_run.json" + +DEFAULTS = { + "install_base": "/opt/ozone", + "data_base": "/data/ozone", + "ozone_version": "2.0.0", + "jdk_major": 17, + "service_user": "ozone", + "service_group": "ozone", + "dl_url": "https://dlcdn.apache.org/ozone", + "JAVA_MARKER": "Apache Ozone Installer Java Home", + "ENV_MARKER": "Apache Ozone Installer Env", + "start_after_install": True, + "use_sudo": True, +} + +def get_logger(log_path: Optional[Path] = None) -> logging.Logger: + try: + LOGS_DIR.mkdir(parents=True, exist_ok=True) + except Exception: + pass + logger = logging.getLogger("ozone_installer") + logger.setLevel(logging.INFO) + # Avoid duplicate handlers if re-invoked + if not logger.handlers: + dest = log_path or (LOGS_DIR / "ansible.log") + fh = logging.FileHandler(dest) + fh.setLevel(logging.INFO) + formatter = logging.Formatter("%(asctime)s | %(levelname)s | %(message)s") + fh.setFormatter(formatter) + logger.addHandler(fh) + sh = logging.StreamHandler(sys.stdout) + logger.addHandler(sh) + return logger + +def parse_args(argv: List[str]) -> argparse.Namespace: + p = argparse.ArgumentParser( + description="Ozone Ansible Installer (Python trigger) - mirrors bash installer flags" + ) + p.add_argument("-H", "--host", help="Target host(s). Non-HA: host. HA: comma-separated or brace expansion host{1..n}") + p.add_argument("-m", "--auth-method", choices=["password", "key"], default=None) + p.add_argument("-p", "--password", help="SSH password (for --auth-method=password)") + p.add_argument("-k", "--keyfile", help="SSH private key file (for --auth-method=key)") + p.add_argument("-v", "--version", help="Ozone version (e.g., 2.0.0) or 'local'") + p.add_argument("-i", "--install-dir", help=f"Install root (default: {DEFAULTS['install_base']})") + p.add_argument("-d", "--data-dir", help=f"Data root (default: {DEFAULTS['data_base']})") + p.add_argument("-s", "--start", action="store_true", help="Initialize and start after install") + p.add_argument("-M", "--cluster-mode", choices=["non-ha", "ha"], help="Force cluster mode (default: auto by host count)") + p.add_argument("-r", "--role-file", help="Role file (YAML) for HA mapping (optional)") + p.add_argument("-j", "--jdk-version", type=int, choices=[17, 21], help="JDK major version (default: 17)") + p.add_argument("-c", "--config-dir", help="Config dir (optional, templates are used by default)") + p.add_argument("-x", "--clean", action="store_true", help="(Reserved) Cleanup before install [not yet implemented]") + p.add_argument("-l", "--ssh-user", help="SSH username (default: root)") + p.add_argument("-S", "--use-sudo", action="store_true", help="Run remote commands via sudo (default)") + p.add_argument("-u", "--service-user", help="Service user (default: ozone)") + p.add_argument("-g", "--service-group", help="Service group (default: ozone)") + # Local extras + p.add_argument("--local-path", help="Path to local Ozone build (contains bin/ozone)") + p.add_argument("--dl-url", help="Upstream download base URL") + p.add_argument("--yes", action="store_true", help="Non-interactive; accept defaults for missing values") + p.add_argument("-R", "--resume", action="store_true", help="Resume play at last failed task (if available)") + return p.parse_args(argv) + +def _validate_local_ozone_dir(path: Path) -> bool: + """ + Returns True if 'path/bin/ozone' exists and is executable. + """ + ozone_bin = path / "bin" / "ozone" + try: + return ozone_bin.exists() and os.access(str(ozone_bin), os.X_OK) + except OSError: + return False + +def prompt(prompt_text: str, default: Optional[str] = None, secret: bool = False, yes_mode: bool = False) -> Optional[str]: + if yes_mode: + return default + if click is not None and sys.stdout.isatty(): + try: + display = prompt_text + # logger.info(f"prompt_text: {prompt_text} , default: {default}") + if default: + display = f"{prompt_text} [default={default}]" + if secret: + return click.prompt(display, default=default, hide_input=True, show_default=False) + return click.prompt(display, default=default, show_default=False) + except (EOFError, KeyboardInterrupt): + return default + # Fallback to built-in input/getpass + try: + text = f"{prompt_text}: " + if default: + text = f"{prompt_text} [default={default}]: " + if secret: + import getpass + val = getpass.getpass(text) + else: + val = input(text) + if not val and default is not None: + return default + return val + except EOFError: + return default + +def _semver_key(v: str) -> Tuple[int, int, int, str]: + """ + Convert version like '2.0.0' or '2.1.0-RC0' to a sortable key. + Pre-release suffix sorts before final. + """ + try: + core, *rest = v.split("-", 1) + major, minor, patch = core.split(".") + suffix = rest[0] if rest else "" + return (int(major), int(minor), int(patch), suffix) + except Exception: + return (0, 0, 0, v) + +def _render_table(rows: List[Tuple[str, str]]) -> str: + """ + Returns a simple two-column table string without extra dependencies. + """ + if not rows: + return "" + col1_width = max(len(k) for k, _ in rows) + col2_width = max(len(str(v)) for _, v in rows) + sep = f"+-{'-' * col1_width}-+-{'-' * col2_width}-+" + out = [sep, f"| {'Field'.ljust(col1_width)} | {'Value'.ljust(col2_width)} |", sep] + for k, v in rows: + out.append(f"| {k.ljust(col1_width)} | {str(v).ljust(col2_width)} |") + out.append(sep) + return "\n".join(out) + +def _confirm_summary(rows: List[Tuple[str, str]], yes_mode: bool) -> bool: + """ + Print the input summary table and ask user to continue. Returns True if confirmed. + """ + logger = get_logger() + table = _render_table(rows) + if click is not None: + logger.info(table) + if yes_mode: + return True + return click.confirm("Proceed with these settings?", default=True) + else: + logger.info(table) + if yes_mode: + return True + answer = prompt("Proceed with these settings? (Y/n)", default="Y", yes_mode=False) + return str(answer or "Y").strip().lower() in ("y", "yes") + +def fetch_available_versions(dl_url: str, limit: int = 30) -> List[str]: + """ + Fetch available Ozone versions from the download base. Returns newest-first. + """ + try: + import urllib.request + with urllib.request.urlopen(dl_url, timeout=10) as resp: + html = resp.read().decode("utf-8", errors="ignore") + # Apache directory listing usually has anchors like href="2.0.0/" + candidates = set(m.group(1) for m in re.finditer(r'href="([0-9]+\.[0-9]+\.[0-9]+(?:-[A-Za-z0-9]+)?)\/"', html)) + versions = sorted(candidates, key=_semver_key, reverse=True) + if limit and len(versions) > limit: + versions = versions[:limit] + return versions + except Exception: + return [] + +def choose_version_interactive(versions: List[str], default_version: str, yes_mode: bool) -> Optional[str]: + """ + Present a numbered list and prompt user to choose a version. + Returns selected version string or None if not chosen. + """ + if not versions: + return None + if yes_mode: + return versions[0] + # Use click when available and interactive; otherwise fallback to basic prompt + if click is not None and sys.stdout.isatty(): + click.echo("Available Ozone versions (newest first):") + for idx, ver in enumerate(versions, start=1): + click.echo(f" {idx}) {ver}") + while True: + choice = prompt( + "Select number, type a version (e.g., 2.1.0) or 'local'", + default="1", + yes_mode=False, + ) + if choice is None: + return versions[0] + choice = str(choice).strip() + if choice == "": + return versions[0] + if choice.lower() == "local": + return "local" + if choice.isdigit(): + i = int(choice) + if 1 <= i <= len(versions): + return versions[i - 1] + if re.match(r"^[0-9]+\.[0-9]+\.[0-9]+(?:-[A-Za-z0-9]+)?$", choice): + return choice + click.echo("Invalid selection. Enter a number, a valid version, or 'local'.") + else: + logger = get_logger() + logger.info("Available Ozone versions:") + for idx, ver in enumerate(versions, start=1): + logger.info(f" {idx}) {ver}") + while True: + choice = prompt("Select number, type a version (e.g., 2.1.0) or 'local'", default="1", yes_mode=False) + if choice is None or str(choice).strip() == "": + return versions[0] + choice = str(choice).strip() + if choice.lower() == "local": + return "local" + if choice.isdigit(): + i = int(choice) + if 1 <= i <= len(versions): + return versions[i - 1] + # allow typing a specific version not listed + if re.match(r"^[0-9]+\.[0-9]+\.[0-9]+(?:-[A-Za-z0-9]+)?$", choice): + return choice + logger.info("Invalid selection. Please enter a number from the list, a valid version (e.g., 2.1.0) or 'local'.") + +def expand_braces(expr: str) -> List[str]: + # Supports simple pattern like prefix{1..N}suffix + if not expr or "{" not in expr or ".." not in expr or "}" not in expr: + return [expr] + m = re.search(r"(.*)\{(\d+)\.\.(\d+)\}(.*)", expr) + if not m: + return [expr] + pre, a, b, post = m.group(1), int(m.group(2)), int(m.group(3)), m.group(4) + return [f"{pre}{i}{post}" for i in range(a, b + 1)] + +def parse_hosts(hosts_raw: Optional[str]) -> List[dict]: + """ + Accepts comma-separated hosts; each may contain brace expansion. + Returns list of dicts: {host, user, port} + """ + if not hosts_raw: + return [] + out = [] + for token in hosts_raw.split(","): + token = token.strip() + expanded = expand_braces(token) + for item in expanded: + user = None + hostport = item + if "@" in item: + user, hostport = item.split("@", 1) + host = hostport + port = None + if ":" in hostport: + host, port = hostport.split(":", 1) + out.append({"host": host, "user": user, "port": port}) + return out + +def auto_cluster_mode(hosts: List[dict], forced: Optional[str] = None) -> str: + if forced in ("non-ha", "ha"): + return forced + return "ha" if len(hosts) >= 3 else "non-ha" + +def build_inventory(hosts: List[dict], ssh_user: Optional[str] = None, keyfile: Optional[str] = None, password: Optional[str] = None, cluster_mode: str = "non-ha") -> str: + """ + Returns INI inventory text for our groups: [om], [scm], [datanodes], [recon], [s3g] + """ + if not hosts: + return "" + # Non-HA mapping: OM/SCM on first host; all hosts as datanodes; recon on first + if cluster_mode == "non-ha": + h = hosts[0] + return _render_inv_groups( + om=[h], scm=[h], dn=hosts, recon=[h], s3g=[h], + ssh_user=ssh_user, keyfile=keyfile, password=password + ) + # HA: first 3 go to OM and SCM; all to datanodes; recon is first if present + om = hosts[:3] if len(hosts) >= 3 else hosts + scm = hosts[:3] if len(hosts) >= 3 else hosts + dn = hosts + recon = [hosts[0]] + s3g = [hosts[0]] + return _render_inv_groups(om=om, scm=scm, dn=dn, recon=recon, s3g=s3g, + ssh_user=ssh_user, keyfile=keyfile, password=password) + +def _render_inv_groups(om: List[dict], scm: List[dict], dn: List[dict], recon: List[dict], s3g: List[dict], ssh_user: Optional[str] = None, keyfile: Optional[str] = None, password: Optional[str] = None) -> str: + def hostline(hd): + parts = [hd["host"]] + if ssh_user or hd.get("user"): + parts.append(f"ansible_user={(ssh_user or hd.get('user'))}") + if hd.get("port"): + parts.append(f"ansible_port={hd['port']}") + if keyfile: + parts.append(f"ansible_ssh_private_key_file={shlex.quote(str(keyfile))}") + if password: + parts.append(f"ansible_password={shlex.quote(password)}") + return " ".join(parts) + + sections = [] + sections.append("[om]") + sections += [hostline(h) for h in om] + sections.append("\n[scm]") + sections += [hostline(h) for h in scm] + sections.append("\n[datanodes]") + sections += [hostline(h) for h in dn] + sections.append("\n[recon]") + sections += [hostline(h) for h in recon] + sections.append("\n[s3g]") + sections += [hostline(h) for h in s3g] + sections.append("\n") + return "\n".join(sections) + +def run_playbook(playbook: Path, inventory_path: Path, extra_vars_path: Path, ask_pass: bool = False, become: bool = True, start_at_task: Optional[str] = None, tags: Optional[List[str]] = None) -> int: + cmd = [ + "ansible-playbook", + "-i", str(inventory_path), + str(playbook), + "-e", f"@{extra_vars_path}", + ] + if ask_pass: + cmd.append("-k") + if become: + cmd.append("--become") + if start_at_task: + cmd += ["--start-at-task", str(start_at_task)] + if tags: + cmd += ["--tags", ",".join(tags)] + env = os.environ.copy() + env["ANSIBLE_CONFIG"] = str(ANSIBLE_CFG) + # Route Ansible logs to the same file as the Python logger + log_path = LOGS_DIR / "ansible.log" + try: + logger = get_logger() + for h in logger.handlers: + if isinstance(h, logging.FileHandler): + # type: ignore[attr-defined] + log_path = Path(getattr(h, "baseFilename")) # type: ignore + break + except Exception: + pass + env["ANSIBLE_LOG_PATH"] = str(log_path) + logger = get_logger() + if start_at_task: + logger.info(f"Resuming from task: {start_at_task}") + if tags: + logger.info(f"Using tags: {','.join(tags)}") + logger.info(f"Running: {' '.join(shlex.quote(c) for c in cmd)}") + return subprocess.call(cmd, env=env) + +def main(argv: List[str]) -> int: + args = parse_args(argv) + # Resume mode: reuse last provided configs and suppress prompts when possible + resuming = bool(getattr(args, "resume", False)) + yes = True if resuming else bool(args.yes) + last_cfg = None + if resuming and LAST_RUN_FILE.exists(): + try: + last_cfg = json.loads(LAST_RUN_FILE.read_text(encoding="utf-8")) + except Exception: + last_cfg = None + + # Gather inputs interactively where missing + hosts_raw_default = (last_cfg.get("hosts_raw") if last_cfg else None) + hosts_raw = args.host or hosts_raw_default or prompt("Target host(s) [non-ha: host | HA: h1,h2,h3 or brace expansion]", default="", yes_mode=yes) + hosts = parse_hosts(hosts_raw) if hosts_raw else [] + # Initialize per-run logger as soon as we have hosts_raw + try: + ts = datetime.now().strftime("%Y%m%d-%H%M%S") + raw_hosts_for_name = (hosts_raw or "").strip() + safe_hosts = re.sub(r"[^A-Za-z0-9_.-]+", "-", raw_hosts_for_name)[:80] or "hosts" + run_log_path = LOGS_DIR / f"ansible-{ts}-{safe_hosts}.log" + logger = get_logger(run_log_path) + logger.info(f"Logging to: {run_log_path}") + except Exception: + run_log_path = LOGS_DIR / "ansible.log" + logger = get_logger(run_log_path) + logger.info(f"Logging to: {run_log_path} (fallback)") + + if not hosts: + logger.error("Error: No hosts provided (-H/--host).") + return 2 + # Decide HA vs Non-HA with user input; default depends on host count + resume_cluster_mode = (last_cfg.get("cluster_mode") if last_cfg else None) + if args.cluster_mode: + cluster_mode = args.cluster_mode + elif resume_cluster_mode: + cluster_mode = resume_cluster_mode + else: + default_mode = "ha" if len(hosts) >= 3 else "non-ha" + selected = prompt("Deployment type (ha|non-ha)", default=default_mode, yes_mode=yes) + cluster_mode = (selected or default_mode).strip().lower() + if cluster_mode not in ("ha", "non-ha"): + cluster_mode = default_mode + if cluster_mode == "ha" and len(hosts) < 3: + logger.error("Error: HA requires at least 3 hosts (to map 3 OMs and 3 SCMs).") + return 2 + + # Resolve download base early for version selection + dl_url = args.dl_url or (last_cfg.get("dl_url") if last_cfg else None) or DEFAULTS["dl_url"] + ozone_version = args.version or (last_cfg.get("ozone_version") if last_cfg else None) + if not ozone_version: + # Try to fetch available versions from dl_url and offer selection + versions = fetch_available_versions(dl_url or DEFAULTS["dl_url"]) + selected = choose_version_interactive(versions, DEFAULTS["ozone_version"], yes_mode=yes) + if selected: + ozone_version = selected + else: + # Fallback prompt if fetch failed + ozone_version = prompt("Ozone version (e.g., 2.1.0 | local)", default=DEFAULTS["ozone_version"], yes_mode=yes) + jdk_major = args.jdk_version if args.jdk_version is not None else ((last_cfg.get("jdk_major") if last_cfg else None)) + if jdk_major is None: + _jdk_val = prompt("JDK major (17|21)", default=str(DEFAULTS["jdk_major"]), yes_mode=yes) + try: + jdk_major = int(str(_jdk_val)) if _jdk_val is not None else DEFAULTS["jdk_major"] + except Exception: + jdk_major = DEFAULTS["jdk_major"] + install_base = args.install_dir or (last_cfg.get("install_base") if last_cfg else None) \ + or prompt("Install base directory (binaries and configs; e.g., /opt/ozone)", default=DEFAULTS["install_base"], yes_mode=yes) + data_base = args.data_dir or (last_cfg.get("data_base") if last_cfg else None) \ + or prompt("Data base directory (metadata and DN data; e.g., /data/ozone)", default=DEFAULTS["data_base"], yes_mode=yes) + + # Auth (before service user/group) + auth_method = args.auth_method or (last_cfg.get("auth_method") if last_cfg else None) \ + or prompt("Auth method (key|password)", default="password", yes_mode=yes) + if auth_method not in ("key", "password"): + auth_method = "password" + ssh_user = args.ssh_user or (last_cfg.get("ssh_user") if last_cfg else None) \ + or prompt("SSH username", default="root", yes_mode=yes) + password = args.password or ((last_cfg.get("password") if last_cfg else None)) # persisted for resume on request + keyfile = args.keyfile or (last_cfg.get("keyfile") if last_cfg else None) + if auth_method == "password" and not password: + password = prompt("SSH password", default="", secret=True, yes_mode=yes) + if auth_method == "key" and not keyfile: + keyfile = prompt("Path to SSH private key", default=str(Path.home() / ".ssh" / "id_ed25519"), yes_mode=yes) + # Ensure we don't mix methods + if auth_method == "password": + keyfile = None + elif auth_method == "key": + password = None + service_user = args.service_user or (last_cfg.get("service_user") if last_cfg else None) \ + or prompt("Service user", default=DEFAULTS["service_user"], yes_mode=yes) + service_group = args.service_group or (last_cfg.get("service_group") if last_cfg else None) \ + or prompt("Service group", default=DEFAULTS["service_group"], yes_mode=yes) + dl_url = args.dl_url or (last_cfg.get("dl_url") if last_cfg else None) or DEFAULTS["dl_url"] + start_after_install = (args.start or (last_cfg.get("start_after_install") if last_cfg else None) + or DEFAULTS["start_after_install"]) + use_sudo = (args.use_sudo or (last_cfg.get("use_sudo") if last_cfg else None) + or DEFAULTS["use_sudo"]) + + # Local specifics (single path to local build) + local_path = (getattr(args, "local_path", None) or (last_cfg.get("local_path") if last_cfg else None)) + local_shared_path = None + local_oz_dir = None + if ozone_version and ozone_version.lower() == "local": + # Accept a direct path to the ozone build dir (relative or absolute) and validate it. + # Backward-compat: if only legacy split values were saved previously, resolve them. + candidate = None + if local_path: + candidate = Path(local_path).expanduser().resolve() + else: + legacy_shared = (last_cfg.get("local_shared_path") if last_cfg else None) + legacy_dir = (last_cfg.get("local_ozone_dirname") if last_cfg else None) + if legacy_shared and legacy_dir: + candidate = Path(legacy_shared).expanduser().resolve() / legacy_dir + + def ask_for_path(): + val = prompt("Path to local Ozone build", default="", yes_mode=yes) + return Path(val).expanduser().resolve() if val else None + + if candidate is None or not _validate_local_ozone_dir(candidate): + if yes: + logger.error("Error: For -v local, a valid Ozone build path containing bin/ozone is required.") + return 2 + while True: + maybe = ask_for_path() + if maybe and _validate_local_ozone_dir(maybe): + candidate = maybe + break + logger.warning("Invalid path. Expected an Ozone build directory with bin/ozone. Please try again.") + + # Normalize back to shared path + dirname for Ansible vars and persistable single path + local_shared_path = str(candidate.parent) + local_oz_dir = candidate.name + local_path = str(candidate) + + # Build a human-friendly summary table of inputs before continuing + host_list_display = str(hosts_raw or "") + summary_rows: List[Tuple[str, str]] = [ + ("Hosts", host_list_display), + ("Cluster mode", cluster_mode), + ("Ozone version", str(ozone_version)), + ("JDK major", str(jdk_major)), + ("Install base", str(install_base)), + ("Data base", str(data_base)), + ("SSH user", str(ssh_user)), + ("Auth method", str(auth_method)) + ] + if keyfile: + summary_rows.append(("Key file", str(keyfile))) + summary_rows.extend([("Use sudo", str(bool(use_sudo))), + ("Service user", str(service_user)), + ("Service group", str(service_group)), + ("Start after install", str(bool(start_after_install)))]) + if ozone_version and str(ozone_version).lower() == "local": + summary_rows.append(("Local Ozone path", str(local_path or ""))) + if not _confirm_summary(summary_rows, yes_mode=yes): + logger.info("Aborted by user.") + return 1 + + # Prepare dynamic inventory and extra-vars + inventory_text = build_inventory(hosts, ssh_user=ssh_user, keyfile=keyfile, password=password, + cluster_mode=cluster_mode) + # Decide cleanup behavior up-front (so we can pass it into the unified play) + do_cleanup = False + if args.clean: + do_cleanup = True + else: + answer = prompt(f"Cleanup existing install at {install_base} (if present)? (y/N)", default="n", yes_mode=yes) + if str(answer).strip().lower().startswith("y"): + do_cleanup = True + + extra_vars = { + "cluster_mode": cluster_mode, + "install_base": install_base, + "data_base": data_base, + "jdk_major": jdk_major, + "service_user": service_user, + "service_group": service_group, + "dl_url": dl_url, + "ozone_version": ozone_version, + "start_after_install": bool(start_after_install), + "use_sudo": bool(use_sudo), + "do_cleanup": bool(do_cleanup), + "JAVA_MARKER": DEFAULTS["JAVA_MARKER"], + "ENV_MARKER": DEFAULTS["ENV_MARKER"], + "controller_logs_dir": str(LOGS_DIR), + } + if ozone_version and ozone_version.lower() == "local": + extra_vars.update({ + "local_shared_path": local_shared_path or "", + "local_ozone_dirname": local_oz_dir or "", + }) + + ask_pass = (auth_method == "password" and not password) # whether to forward -k; we embed password if provided + + with tempfile.TemporaryDirectory() as tdir: + inv_path = Path(tdir) / "hosts.ini" + ev_path = Path(tdir) / "vars.json" + inv_path.write_text(inventory_text or "", encoding="utf-8") + ev_path.write_text(json.dumps(extra_vars, indent=2), encoding="utf-8") + # Persist last run configs (and use them for execution) + try: + os.makedirs(LOGS_DIR, exist_ok=True) + # Save inventory/vars for direct reuse + persisted_inv = LOGS_DIR / "last_inventory.ini" + persisted_ev = LOGS_DIR / "last_vars.json" + persisted_inv.write_text(inventory_text or "", encoding="utf-8") + persisted_ev.write_text(json.dumps(extra_vars, indent=2), encoding="utf-8") + # Point playbook execution to persisted files (consistent first run and reruns) + inv_path = persisted_inv + ev_path = persisted_ev + # Save effective simple config for future resume + LAST_RUN_FILE.write_text(json.dumps({ + "hosts_raw": hosts_raw, + "cluster_mode": cluster_mode, + "ozone_version": ozone_version, + "jdk_major": jdk_major, + "install_base": install_base, + "data_base": data_base, + "auth_method": auth_method, + "ssh_user": ssh_user, + "password": password if auth_method == "password" else None, + "keyfile": str(keyfile) if keyfile else None, + "service_user": service_user, + "service_group": service_group, + "dl_url": dl_url, + "start_after_install": bool(start_after_install), + "use_sudo": bool(use_sudo), + "local_shared_path": local_shared_path or "", + "local_ozone_dirname": local_oz_dir or "", + }, indent=2), encoding="utf-8") + except Exception: + # Fall back to temp files if persisting fails + pass + # Roles order removed (no resume via tags) + + # Install + (optional) start (single merged playbook) + playbook = PLAYBOOKS_DIR / "cluster.yml" + start_at = None + use_tags = None + if args.resume: + if LAST_FAILED_FILE.exists(): + try: + # use first line (task name) + contents = LAST_FAILED_FILE.read_text(encoding="utf-8").splitlines() + start_at = contents[0].strip() if contents else None + # derive role tag if present + role_line = next((l for l in contents if l.startswith("# role:")), None) + if role_line: + role_name = role_line.split(":", 1)[1].strip() + if role_name: + use_tags = [role_name] + except Exception: + start_at = None + rc = run_playbook(playbook, inv_path, ev_path, ask_pass=ask_pass, become=True, start_at_task=start_at, tags=use_tags) + if rc != 0: + return rc + + # Successful completion: remove last_* persisted files so a fresh run starts clean + try: + for f in LOGS_DIR.glob("last_*"): + try: + f.unlink() + except FileNotFoundError: + pass + except Exception: + # Best-effort cleanup; ignore failures + pass + except Exception: + pass + + try: + example_host = hosts[0]["host"] if hosts else "HOSTNAME" + logger.info(f"To view process logs: ssh to the node and read {install_base}/current/logs/ozone-{service_user}-<process>-<host>.log " + f"(e.g., {install_base}/current/logs/ozone-{service_user}-recon-{example_host}.log)") + except Exception: + pass + logger.info("All done.") + return 0 + +if __name__ == "__main__": + sys.exit(main(sys.argv[1:])) + + diff --git a/playbooks/cluster.yml b/playbooks/cluster.yml new file mode 100644 index 0000000..fc46321 --- /dev/null +++ b/playbooks/cluster.yml @@ -0,0 +1,59 @@ +--- +- name: "Ozone Cluster Deployment" + hosts: all + gather_facts: false + vars: + # Expect cluster_mode to be passed in (non-ha | ha). Fallback to non-ha. + cluster_mode: "{{ cluster_mode | default('non-ha') }}" + ha_enabled: "{{ cluster_mode == 'ha' }}" + pre_tasks: + - name: "Pre-install: Ensure python3 present" + raw: | + if command -v apt-get >/dev/null 2>&1; then sudo -n apt-get update -y && sudo -n apt-get install -y python3 || true; + elif command -v dnf >/dev/null 2>&1; then sudo -n dnf install -y python3 || true; + elif command -v yum >/dev/null 2>&1; then sudo -n yum install -y python3 || true; + elif command -v zypper >/dev/null 2>&1; then sudo -n zypper --non-interactive in -y python3 || true; + fi + args: + executable: /bin/bash + changed_when: false + failed_when: false + + - name: "Pre-install: Gather facts" + setup: + + - name: "Pre-install: Ensure Ansible remote tmp exists" + file: + path: "{{ (ansible_env.TMPDIR | default('/tmp')) ~ '/.ansible-' ~ ansible_user_id }}" + state: directory + mode: "0700" + owner: "{{ ansible_user_id }}" + + roles: + - role: cleanup + tags: ["cleanup"] + when: (do_cleanup | default(false)) + - role: ozone_user + tags: ["ozone_user"] + - role: ssh_bootstrap + tags: ["ssh_bootstrap"] + - role: java + tags: ["java"] + - role: ozone_layout + tags: ["ozone_layout"] + - role: ozone_fetch + tags: ["ozone_fetch"] + - role: ozone_config + tags: ["ozone_config"] + - role: ozone_service + tags: ["ozone_service"] + when: start_after_install | bool + +- name: "Ozone Smoke Test" + hosts: "{{ groups['om'] | list | first }}" + gather_facts: false + roles: + - role: ozone_ui + tags: ["ozone_ui"] + - role: ozone_smoke + tags: ["ozone_smoke"] diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..541effe --- /dev/null +++ b/requirements.txt @@ -0,0 +1,2 @@ +ansible-core==2.17.* +click==8.* diff --git a/roles/cleanup/tasks/main.yml b/roles/cleanup/tasks/main.yml new file mode 100644 index 0000000..7187018 --- /dev/null +++ b/roles/cleanup/tasks/main.yml @@ -0,0 +1,52 @@ +--- +- name: "Check install_base presence" + stat: + path: "{{ install_base }}" + register: _st_install_base + become: true + +- name: "Set presence flag" + set_fact: + install_present: "{{ _st_install_base.stat.exists | default(false) }}" + +- name: "Skip cleanup when install_base is absent on this host" + debug: + msg: "install_base '{{ install_base }}' not present; skipping cleanup on this host" + when: not install_present + changed_when: false + +- name: "Perform cleanup when install_base exists" + when: install_present + block: + - name: "Set ozone bin path" + set_fact: + ozone_bin: "{{ install_base }}/current/bin/ozone" + + - name: "Kill OMs/SCMs/Datanodes/Recon (if running)" + shell: | + pkill -KILL -f "{{ item }}" + become: true + failed_when: false + changed_when: false + loop: + - "org.apache.hadoop.ozone.om.OzoneManagerStarter" + - "org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter" + - "org.apache.hadoop.ozone.HddsDatanodeService" + - "org.apache.hadoop.ozone.recon.ReconServer" + - "org.apache.hadoop.ozone.s3.Gateway" + loop_control: + label: "{{ item }}" + + - name: "Remove install base" + file: + path: "{{ install_base }}" + state: absent + become: true + + - name: "Remove data base" + file: + path: "{{ data_base }}" + state: absent + become: true + + diff --git a/roles/java/defaults/main.yml b/roles/java/defaults/main.yml new file mode 100644 index 0000000..ad5afac --- /dev/null +++ b/roles/java/defaults/main.yml @@ -0,0 +1,12 @@ +--- +jdk_major: 17 + +# Candidate JAVA_HOME directories to probe; first existing will be used +java_home_candidates: + - "/usr/lib/jvm/java-{{ jdk_major }}-openjdk" + - "/usr/lib/jvm/jre-{{ jdk_major }}-openjdk" + - "/usr/lib/jvm/jdk-{{ jdk_major }}" + - "/usr/lib/jvm/java-{{ jdk_major }}-openjdk-amd64" + - "/usr/lib64/jvm/java-{{ jdk_major }}-openjdk" + + diff --git a/roles/java/tasks/main.yml b/roles/java/tasks/main.yml new file mode 100644 index 0000000..bbe1d62 --- /dev/null +++ b/roles/java/tasks/main.yml @@ -0,0 +1,77 @@ +--- +- name: "Print OS family" + debug: + var: ansible_os_family + +- name: "Install OpenJDK on RedHat/Rocky/Suse family" + package: + name: + - "java-{{ jdk_major }}-openjdk" + - "java-{{ jdk_major }}-openjdk-devel" + state: present + when: ansible_os_family == "RedHat" or ansible_os_family == "Rocky" or ansible_os_family == "Suse" + become: true + +- name: "Install OpenJDK on Debian/Ubuntu family" + package: + name: + - "openjdk-{{ jdk_major }}-jdk" + when: ansible_os_family == "Debian" or ansible_os_family == "Ubuntu" + become: true + +- name: "Detect JAVA_HOME candidate" + stat: + path: "{{ item }}" + loop: "{{ java_home_candidates }}" + register: java_candidates + become: false + +- name: "Set ozone_java_home from first existing candidate" + set_fact: + ozone_java_home: "{{ (java_candidates.results | selectattr('stat.exists', 'defined') | selectattr('stat.exists') | map(attribute='item') | list | first) | default('') }}" + +- name: "Compute runtime environment for Ozone commands" + set_fact: + ozone_runtime_env: + JAVA_HOME: "{{ ozone_java_home }}" + PATH: "{{ (ansible_env.PATH | default('/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin')) }}:{{ install_base }}/current/bin{{ (':' + ozone_java_home + '/bin') if (ozone_java_home | length > 0) else '' }}" + OZONE_CONF_DIR: "{{ install_base }}/current/etc/hadoop" + HADOOP_CONF_DIR: "{{ install_base }}/current/etc/hadoop" + +- name: "Persist ozone_runtime_env for resume (controller)" + delegate_to: localhost + run_once: true + become: false + vars: + last_vars_path: "{{ playbook_dir }}/../logs/last_vars.json" + block: + - name: "last_vars.json | Read" + slurp: + src: "{{ last_vars_path }}" + register: last_vars_slurp + + - name: "last_vars.json | Merge ozone_runtime_env" + vars: + last_vars_json: "{{ (last_vars_slurp.content | b64decode | from_json) if (last_vars_slurp is defined and last_vars_slurp.content is defined) else {} }}" + merged_all: "{{ last_vars_json | combine({'ozone_runtime_env': ozone_runtime_env}, recursive=True) }}" + copy: + dest: "{{ last_vars_path }}" + content: "{{ merged_all | to_nice_json }}" + mode: "0644" + +- name: "Export JAVA_HOME and update PATH in profile.d/ozone.sh" + blockinfile: + path: "/etc/profile.d/ozone.sh" + create: true + owner: root + group: root + mode: "0644" + marker: "# {mark} {{ JAVA_MARKER }}" + block: | + {% if ozone_java_home | length > 0 %} + export JAVA_HOME="{{ ozone_java_home }}" + export PATH="$PATH:{{ ozone_java_home }}/bin" + {% endif %} + become: true + + diff --git a/roles/ozone_config/defaults/main.yml b/roles/ozone_config/defaults/main.yml new file mode 100644 index 0000000..a4757ca --- /dev/null +++ b/roles/ozone_config/defaults/main.yml @@ -0,0 +1,6 @@ +--- +install_base: "/opt/ozone" +data_base: "/data/ozone" +CONFIG_DIR: "" # if provided, can be used to feed additional properties via vars + + diff --git a/roles/ozone_config/tasks/main.yml b/roles/ozone_config/tasks/main.yml new file mode 100644 index 0000000..c4cd024 --- /dev/null +++ b/roles/ozone_config/tasks/main.yml @@ -0,0 +1,57 @@ +--- +- name: "Create etc dir" + file: + path: "{{ install_base }}/current/etc/hadoop" + state: directory + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0755" + become: true + +- name: "Render ozone-hosts.yaml" + template: + src: "ozone-hosts.yaml.j2" + dest: "{{ install_base }}/current/etc/hadoop/ozone-hosts.yaml" + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0644" + become: true + +- name: "Render ozone-site.xml" + template: + src: "ozone-site.xml.j2" + dest: "{{ install_base }}/current/etc/hadoop/ozone-site.xml" + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0644" + become: true + +- name: "Render core-site.xml" + template: + src: "core-site.xml.j2" + dest: "{{ install_base }}/current/etc/hadoop/core-site.xml" + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0644" + become: true + +- name: "Render ozone-env.sh" + template: + src: "ozone-env.sh.j2" + dest: "{{ install_base }}/current/etc/hadoop/ozone-env.sh" + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0755" + become: true + +- name: "Render workers file for datanodes" + template: + src: "workers.j2" + dest: "{{ install_base }}/current/etc/hadoop/workers" + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0644" + when: groups.get('datanodes', []) | length > 0 + become: true + + diff --git a/roles/ozone_config/templates/core-site.xml.j2 b/roles/ozone_config/templates/core-site.xml.j2 new file mode 100644 index 0000000..0f116a4 --- /dev/null +++ b/roles/ozone_config/templates/core-site.xml.j2 @@ -0,0 +1,15 @@ +<configuration> +{% set om_hosts = (groups.get('om', []) | list) %} +{% if (ha_enabled | default(false)) and (om_hosts|length > 1) %} + <property> + <name>fs.defaultFS</name> + <value>ofs://omservice</value> + </property> +{% else %} + <property> + <name>fs.defaultFS</name> + <value>ofs://{{ om_hosts[0] }}:9862</value> + </property> +{% endif %} +</configuration> + diff --git a/roles/ozone_config/templates/ozone-env.sh.j2 b/roles/ozone_config/templates/ozone-env.sh.j2 new file mode 100644 index 0000000..94b2f69 --- /dev/null +++ b/roles/ozone_config/templates/ozone-env.sh.j2 @@ -0,0 +1,53 @@ +#!/usr/bin/env bash +# Managed by Ansible + +export OZONE_OS_TYPE=${OZONE_OS_TYPE:-$(uname -s)} + +{% if ozone_java_home | default('') | length > 0 %} +export JAVA_HOME="{{ ozone_java_home }}" +export PATH="$PATH:{{ ozone_java_home }}/bin" +{% endif %} +export OZONE_HOME="{{ install_base }}/current" +export PATH="$PATH:{{ install_base }}/current/bin" +export OZONE_CONF_DIR="{{ install_base }}/current/etc/hadoop" +export HADOOP_CONF_DIR="{{ install_base }}/current/etc/hadoop" + +# Relaxed module access for Java 17/21 (needed by Ozone and dependencies) +export JAVA_TOOL_OPTIONS="${JAVA_TOOL_OPTIONS:+$JAVA_TOOL_OPTIONS} --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED" + +{% if ozone_opts | default('-XX:ParallelGCThreads=8') | length > 0 %} +# Extra JVM options for all Ozone components +export OZONE_OPTS="{{ ozone_opts | default('-XX:ParallelGCThreads=8') }}" +{% endif %} + +export OZONE_OM_USER="{{ service_user }}" + +# export OZONE_HEAPSIZE_MAX= +# export OZONE_HEAPSIZE_MIN= +# export OZONE_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug" +# export OZONE_CLIENT_OPTS="" +# export OZONE_CLASSPATH="/some/cool/path/on/your/machine" +# export OZONE_USER_CLASSPATH_FIRST="yes" +# export OZONE_USE_CLIENT_CLASSLOADER=true +# export OZONE_SSH_OPTS="-o BatchMode=yes -o StrictHostKeyChecking=no -o ConnectTimeout=10s" +# export OZONE_SSH_PARALLEL=10 +# export OZONE_WORKERS="${OZONE_CONF_DIR}/workers" +# export OZONE_LOG_DIR=${OZONE_HOME}/logs +# export OZONE_IDENT_STRING=$USER +# export OZONE_STOP_TIMEOUT=5 +# export OZONE_PID_DIR=/tmp +# export OZONE_ROOT_LOGGER=INFO,console +# export OZONE_DAEMON_ROOT_LOGGER=INFO,RFA +# export OZONE_SECURITY_LOGGER=INFO,NullAppender +# export OZONE_NICENESS=0 +# export OZONE_POLICYFILE="hadoop-policy.xml" +# export OZONE_GC_SETTINGS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps" +# export JSVC_HOME=/usr/bin +# export OZONE_SECURE_PID_DIR=${OZONE_PID_DIR} +# export OZONE_SECURE_LOG=${OZONE_LOG_DIR} +# export OZONE_SECURE_IDENT_PRESERVE="true" +# export OZONE_OM_OPTS="" +# export OZONE_DATANODE_OPTS="" +# export OZONE_SCM_OPTS="" +# export OZONE_ENABLE_BUILD_PATHS="true" + diff --git a/roles/ozone_config/templates/ozone-hosts.yaml.j2 b/roles/ozone_config/templates/ozone-hosts.yaml.j2 new file mode 100644 index 0000000..84b221e --- /dev/null +++ b/roles/ozone_config/templates/ozone-hosts.yaml.j2 @@ -0,0 +1,30 @@ +om: +{% if (ha_enabled | default(false)) %} +{% for h in (groups.get('om', []) | default([])) %} + - {{ h | regex_replace('^.*@','') | regex_replace(':.*$','') }} +{% endfor %} +{% else %} +{% if (groups.get('om', []) | default([])) | length > 0 %} + - {{ (groups.get('om', [])[0]) | regex_replace('^.*@','') | regex_replace(':.*$','') }} +{% endif %} +{% endif %} +scm: +{% if (ha_enabled | default(false)) %} +{% for h in (groups.get('scm', []) | default([])) %} + - {{ h | regex_replace('^.*@','') | regex_replace(':.*$','') }} +{% endfor %} +{% else %} +{% if (groups.get('scm', []) | default([])) | length > 0 %} + - {{ (groups.get('scm', [])[0]) | regex_replace('^.*@','') | regex_replace(':.*$','') }} +{% endif %} +{% endif %} +datanodes: +{% for h in (groups.get('datanodes', []) | default([])) %} + - {{ h | regex_replace('^.*@','') | regex_replace(':.*$','') }} +{% endfor %} +recon: +{% if (groups.get('recon', []) | default([])) | length > 0 %} + - {{ (groups.get('recon', [])[0]) | regex_replace('^.*@','') | regex_replace(':.*$','') }} +{% endif %} + + diff --git a/roles/ozone_config/templates/ozone-site.xml.j2 b/roles/ozone_config/templates/ozone-site.xml.j2 new file mode 100644 index 0000000..9001a5b --- /dev/null +++ b/roles/ozone_config/templates/ozone-site.xml.j2 @@ -0,0 +1,128 @@ +<configuration> + <!-- Minimal Ozone site config; extend via group_vars if needed --> +{% set _om_all = groups.get('om', [])| list %} +{% set _scm_all = groups.get('scm', []) | list %} +{% set _all_dn_count = groups.get('datanodes', []) | list | length %} +{% set recon_hosts = groups.get('recon', []) | list %} +{% set s3g_hosts = groups.get('s3g', []) | list %} +{% set om_hosts = (_om_all[:1] if not (ha_enabled | default(false)) else _om_all) %} +{% set scm_hosts = (_scm_all[:1] if not (ha_enabled | default(false)) else _scm_all) %} + +{% if scm_hosts|length > 0 %} + <property> + <name>ozone.scm.names</name> + <value>{{ scm_hosts | join(',') }}</value> + </property> + <property> + <name>ozone.scm.client.address</name> + <value>{{ scm_hosts | join(':9860,') }}:9860</value> + </property> + <property> + <name>ozone.scm.datanode.address</name> + <value>{{ scm_hosts | join(':9861,') }}:9861</value> + </property> +{% endif %} +{% if scm_hosts|length > 1 %} + <property> + <name>ozone.scm.primordial.node.id</name> + <value>{{ scm_hosts[0] }}</value> + </property> + <property> + <name>ozone.scm.service.ids</name> + <value>scmservice</value> + </property> + <property> + <name>ozone.scm.nodes.scmservice</name> + <value>{% for i in range(scm_hosts|length) %}{{ 'scm' ~ (i+1) }}{% if not loop.last %},{% endif %}{% endfor %}</value> + </property> +{% for h in scm_hosts %} + <property> + <name>ozone.scm.address.scmservice.scm{{ loop.index }}</name> + <value>{{ h }}</value> + </property> +{% endfor %} +{% endif %} +{% if om_hosts|length == 1 %} + <property> + <name>ozone.om.address</name> + <value>{{ om_hosts[0] }}:9862</value> + </property> +{% elif om_hosts|length > 1 %} + <property> + <name>ozone.om.service.ids</name> + <value>omservice</value> + </property> + <property> + <name>ozone.om.nodes.omservice</name> + <value>{% for i in range(om_hosts|length) %}{{ 'om' ~ (i+1) }}{% if not loop.last %},{% endif %}{% endfor %}</value> + </property> +{% for h in om_hosts %} + <property> + <name>ozone.om.address.omservice.om{{ loop.index }}</name> + <value>{{ h }}:9862</value> + </property> +{% endfor %} +{% endif %} +{% if recon_hosts|length > 0 %} + <property> + <name>ozone.recon.http-address</name> + <value>{{ recon_hosts[0] }}:9888</value> + </property> + <property> + <name>ozone.recon.address</name> + <value>{{ recon_hosts[0] }}:9891</value> + </property> +{% endif %} +{% if s3g_hosts|length > 0 %} + <property> + <name>ozone.s3g.http-address</name> + <value>{{ s3g_hosts[0] }}:9878</value> + </property> + <property> + <name>ozone.s3g.webadmin.http-address</name> + <value>{{ s3g_hosts[0] }}:19878</value> + </property> +{% endif %} + <property> + <name>ozone.metadata.dirs</name> + <value>{{ data_base }}/meta</value> + </property> + <property> + <name>hdds.datanode.dir</name> + <value>{{ data_base }}/dn</value> + </property> + <property> + <name>dfs.container.ratis.datanode.storage.dir</name> + <value>{{ data_base }}/meta/dn</value> + </property> + <property> + <name>ozone.om.db.dirs</name> + <value>{{ data_base }}/data/om</value> + </property> + <property> + <name>ozone.om.ratis.snapshot.dir</name> + <value>{{ data_base }}/meta/om</value> + </property> + <property> + <name>ozone.scm.db.dirs</name> + <value>{{ data_base }}/data/scm</value> + </property> + <property> + <name>ozone.scm.datanode.id.dir</name> + <value>{{ data_base }}/meta/scm</value> + </property> + <property> + <name>ozone.scm.skip.bootstrap.validation</name> + <value>true</value> + </property> + <property> + <name>ozone.replication</name> +{% if _all_dn_count < 3 %} + <value>ONE</value> +{% else %} + <value>THREE</value> +{% endif %} + </property> +</configuration> + + diff --git a/roles/ozone_config/templates/workers.j2 b/roles/ozone_config/templates/workers.j2 new file mode 100644 index 0000000..482ffd0 --- /dev/null +++ b/roles/ozone_config/templates/workers.j2 @@ -0,0 +1,3 @@ +{% for h in (groups.get('datanodes', []) | list) %} +{{ h }} +{% endfor %} diff --git a/roles/ozone_fetch/defaults/main.yml b/roles/ozone_fetch/defaults/main.yml new file mode 100644 index 0000000..8c96ade --- /dev/null +++ b/roles/ozone_fetch/defaults/main.yml @@ -0,0 +1,9 @@ +--- +ozone_version: "2.0.0" # "local" also supported +dl_url: "https://dlcdn.apache.org/ozone" + +# Local snapshot settings (controller side) +local_shared_path: "" +local_ozone_dirname: "" + + diff --git a/roles/ozone_fetch/tasks/main.yml b/roles/ozone_fetch/tasks/main.yml new file mode 100644 index 0000000..67d4a77 --- /dev/null +++ b/roles/ozone_fetch/tasks/main.yml @@ -0,0 +1,113 @@ +--- +- name: "Ensure install_base exists" + file: + path: "{{ install_base }}" + state: directory + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0755" + become: true + +- name: "Normalize source mode" + set_fact: + _src_mode: "{{ ozone_version | lower }}" + +- name: "Upstream | Download tarball" + get_url: + url: "{{ dl_url | trim('/') }}/{{ ozone_version }}/ozone-{{ ozone_version }}.tar.gz" + dest: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz" + mode: "0644" + owner: "{{ service_user }}" + group: "{{ service_group }}" + timeout: 60 + register: download_result + retries: 5 + delay: 10 + until: download_result is succeeded + when: _src_mode != 'local' + become: true + +- name: "Upstream | Ensure tarball ownership" + file: + path: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz" + owner: "{{ service_user }}" + group: "{{ service_group }}" + state: file + when: _src_mode != 'local' + become: true + +- name: "Upstream | Unarchive to install_base" + unarchive: + src: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz" + dest: "{{ install_base }}" + remote_src: true + owner: "{{ service_user }}" + group: "{{ service_group }}" + when: _src_mode != 'local' + become: true + +- name: "Upstream | Link current" + file: + src: "{{ install_base }}/ozone-{{ ozone_version }}" + dest: "{{ install_base }}/current" + state: link + owner: "{{ service_user }}" + group: "{{ service_group }}" + when: _src_mode != 'local' + become: true + +- name: "Upstream | Remove downloaded tarball after extraction" + file: + path: "{{ install_base }}/ozone-{{ ozone_version }}.tar.gz" + state: absent + when: _src_mode != 'local' + become: true + +- name: "Local | Create tarball on controller" + delegate_to: localhost + run_once: true + become: false + vars: + ansible_become: false + command: + argv: + - tar + - -czf + - "/tmp/{{ local_ozone_dirname }}.tar.gz" + - "{{ local_ozone_dirname }}" + args: + chdir: "{{ local_shared_path }}" + creates: "/tmp/{{ local_ozone_dirname }}.tar.gz" + when: _src_mode == 'local' + +- name: "Local | Unarchive local tarball to install_base" + unarchive: + src: "/tmp/{{ local_ozone_dirname }}.tar.gz" # on controller + dest: "{{ install_base }}" # on remote + remote_src: false # transfer then extract + owner: "{{ service_user }}" + group: "{{ service_group }}" + keep_newer: true + when: _src_mode == 'local' + become: true + +- name: "Local | Link current" + file: + src: "{{ install_base }}/{{ local_ozone_dirname }}" + dest: "{{ install_base }}/current" + state: link + owner: "{{ service_user }}" + group: "{{ service_group }}" + when: _src_mode == 'local' + become: true + +- name: "Local | Remove controller tarball after extraction" + delegate_to: localhost + run_once: true + become: false + vars: + ansible_become: false + file: + path: "/tmp/{{ local_ozone_dirname }}.tar.gz" + state: absent + when: _src_mode == 'local' diff --git a/roles/ozone_layout/defaults/main.yml b/roles/ozone_layout/defaults/main.yml new file mode 100644 index 0000000..c8c5df7 --- /dev/null +++ b/roles/ozone_layout/defaults/main.yml @@ -0,0 +1,3 @@ +--- +install_base: "/opt/ozone" +data_base: "/data/ozone" diff --git a/roles/ozone_layout/tasks/main.yml b/roles/ozone_layout/tasks/main.yml new file mode 100644 index 0000000..0e13135 --- /dev/null +++ b/roles/ozone_layout/tasks/main.yml @@ -0,0 +1,29 @@ +--- +- name: "Create install and data directories" + file: + path: "{{ item }}" + state: directory + owner: "{{ service_user }}" + group: "{{ service_group }}" + mode: "0755" + loop: + - "{{ install_base }}" + - "{{ data_base }}" + - "{{ data_base }}/dn" + - "{{ data_base }}/meta" + become: true + +- name: "Ensure OZONE_HOME and PATH are in profile.d/ozone.sh" + blockinfile: + path: "/etc/profile.d/ozone.sh" + create: true + owner: root + group: root + mode: "0644" + marker: "# {mark} {{ ENV_MARKER }}" + block: | + export OZONE_HOME="{{ install_base }}/current" + export PATH="$PATH:{{ install_base }}/current/bin" + become: true + + diff --git a/roles/ozone_service/tasks/main.yml b/roles/ozone_service/tasks/main.yml new file mode 100644 index 0000000..3edf1e9 --- /dev/null +++ b/roles/ozone_service/tasks/main.yml @@ -0,0 +1,100 @@ +--- + +# Common service command context for HA and Non-HA +- name: "Ozone Service: Start SCM/OM" + become: true + become_user: "{{ service_user }}" + become_flags: "-i" + environment: "{{ ozone_runtime_env }}" + block: + - name: "Initialize/Start first SCM/OM" + block: + - name: "Initialize first SCM" + command: "ozone scm --init" + args: + creates: "{{ data_base }}/meta/scm" + when: (groups['scm'] | length > 0) and (inventory_hostname == groups['scm'][0]) + register: scm_init_first + failed_when: scm_init_first.rc != 0 + + - name: "Start first SCM" + command: "ozone --daemon start scm" + when: (groups['scm'] | length > 0) and (inventory_hostname == groups['scm'][0]) + register: scm_start_first + failed_when: scm_start_first.rc != 0 + + - name: "Initialize first OM" + command: "ozone om --init" + args: + creates: "{{ data_base }}/meta/om" + when: (groups['om'] | length > 0) and (inventory_hostname == groups['om'][0]) + register: om_init_first + failed_when: om_init_first.rc != 0 + + - name: "Start first OM" + command: "ozone --daemon start om" + when: (groups['om'] | length > 0) and (inventory_hostname == groups['om'][0]) + register: om_start_first + failed_when: om_start_first.rc != 0 + + - name: "Start/Init remaining SCM/OM (HA only)" + when: (ha_enabled | default(false)) + block: + - name: "SCM bootstrap on remaining SCMs" + command: "ozone scm --bootstrap" + when: "'scm' in groups and (groups['scm'] | length > 1) and (inventory_hostname in groups['scm'][1:])" + register: scm_bootstrap_rest + failed_when: scm_bootstrap_rest.rc != 0 + + - name: "Start SCM on remaining SCMs" + command: "ozone --daemon start scm" + when: "'scm' in groups and (groups['scm'] | length > 1) and (inventory_hostname in groups['scm'][1:])" + register: scm_start_rest + failed_when: scm_start_rest.rc != 0 + + - name: "OM init on remaining OMs" + command: "ozone om --init" + when: "'om' in groups and (groups['om'] | length > 1) and (inventory_hostname in groups['om'][1:])" + register: om_init_rest + failed_when: om_init_rest.rc != 0 + + - name: "Start OM on remaining OMs" + command: "ozone --daemon start om" + when: "'om' in groups and (groups['om'] | length > 1) and (inventory_hostname in groups['om'][1:])" + register: om_start_rest + failed_when: om_start_rest.rc != 0 + +- name: "Ozone Service: Start Datanodes and Recon" + become: true + become_user: "{{ service_user }}" + become_flags: "-i" + environment: "{{ ozone_runtime_env }}" + block: + - name: "Start Datanodes" + command: "ozone --daemon start datanode" + when: inventory_hostname in (groups.get('datanodes', [])) + async: 300 + poll: 0 + register: dn_job + + - name: "Wait for Datanode start to complete" + when: inventory_hostname in (groups.get('datanodes', [])) + async_status: + jid: "{{ dn_job.ansible_job_id }}" + register: dn_wait + until: dn_wait.finished + failed_when: (dn_wait.rc | default(0)) != 0 + + - name: "Start Recon on first recon host" + command: "ozone --daemon start recon" + when: (groups.get('recon', []) | length > 0) and (inventory_hostname == groups['recon'][0]) + register: recon_start + failed_when: recon_start.rc != 0 + + - name: "Start S3G on first s3g host" + command: "ozone --daemon start s3g" + when: (groups.get('s3g', []) | length > 0) and (inventory_hostname == groups['s3g'][0]) + register: s3g_start + failed_when: s3g_start.rc != 0 + + diff --git a/roles/ozone_smoke/tasks/main.yml b/roles/ozone_smoke/tasks/main.yml new file mode 100644 index 0000000..d98a9e7 --- /dev/null +++ b/roles/ozone_smoke/tasks/main.yml @@ -0,0 +1,91 @@ +--- +- name: "Set replication factor" + set_fact: + create_key_cmd: "{{ 'sh key put --type RATIS --replication ONE' if groups.get('datanodes', []) | length < 3 else 'sh key put' }}" + vol: "demovol" + bucket: "demobuck" + s3g_bucket: "demos3g" + key: "demokey" + ozone_bin: "{{ install_base }}/current/bin/ozone" + +- name: "Print ozone command to create key based on Datanode count" + debug: + msg: "{{ create_key_cmd }}" + +- name: "Run basic smoke commands" + shell: | + set -euo pipefail + dd if=/dev/zero of=/tmp/oz_smoke.bin bs=1M count=1 status=none + {{ ozone_bin }} sh vol create {{ vol }} || true + {{ ozone_bin }} sh bucket create {{ vol }}/{{ bucket }} || true + {{ ozone_bin }} {{ create_key_cmd }} {{ vol }}/{{ bucket }}/{{ key }} /tmp/oz_smoke.bin + rm -f /tmp/oz_smoke.bin + args: + executable: /bin/bash + register: smoke_commands_result + failed_when: smoke_commands_result.rc != 0 + run_once: true + become: true + become_user: "{{ service_user }}" + +- name: "Verify key info" + shell: | + set -euo pipefail + {{ ozone_bin }} sh key info {{ vol }}/{{ bucket }}/{{ key }} + args: + executable: /bin/bash + register: key_info + failed_when: key_info.rc != 0 + run_once: true + become: true + become_user: "{{ service_user }}" + +- name: "Show key info" + debug: + msg: + - "Stdout: {{ (key_info.stdout_lines | default([])) | join('\n') }}" + - "Stderr: {{ (key_info.stderr_lines | default([])) | join('\n') }}" + run_once: true + +- name: "Create test bucket on S3G host (if present)" + block: + - name: "Install awscli on S3G host" + package: + name: awscli + state: present + become: true + + - name: "AWS CLI configure dummy credentials for S3G tests" + shell: | + set -euo pipefail + aws configure set aws_access_key_id dummy + aws configure set aws_secret_access_key dummy + args: + executable: /bin/bash + + - name: "AWS CLI S3G: create test bucket '{{ s3g_bucket }}'" + shell: | + set -o pipefail + aws s3api create-bucket --bucket {{ s3g_bucket }} --endpoint-url "http://{{ groups['s3g'][0] }}:9878" || true + args: + executable: /bin/bash + register: aws_create_result + changed_when: false + + - name: "AWS CLI S3G: list buckets" + shell: | + set -o pipefail + aws s3api list-buckets --endpoint-url "http://{{ groups['s3g'][0] }}:9878" + args: + executable: /bin/bash + register: aws_list_result + changed_when: false + + - name: "Show AWS CLI S3G check output" + debug: + msg: + - "Create bucket output: {{ (aws_create_result.stdout | default('')) }}" + - "List buckets output: {{ (aws_list_result.stdout | default('')) }}" + when: + - groups.get('s3g', []) | length > 0 + - inventory_hostname == groups['s3g'][0] \ No newline at end of file diff --git a/roles/ozone_ui/tasks/main.yml b/roles/ozone_ui/tasks/main.yml new file mode 100644 index 0000000..d4b3f73 --- /dev/null +++ b/roles/ozone_ui/tasks/main.yml @@ -0,0 +1,32 @@ +## Print and export service UI endpoints +- name: "Compute service UI URLs" + set_fact: + _om_hosts_ui: "{{ groups.get('om', []) | list }}" + _scm_hosts_ui: "{{ groups.get('scm', []) | list }}" + _recon_hosts_ui: "{{ groups.get('recon', []) | list }}" + _s3g_hosts_ui: "{{ groups.get('s3g', []) | list }}" + ui_urls: + om: "{{ _om_hosts_ui | map('regex_replace','^(.*)$','http://\\1:9874') | list }}" + scm: "{{ _scm_hosts_ui | map('regex_replace','^(.*)$','http://\\1:9876') | list }}" + recon: "{{ (_recon_hosts_ui | length > 0) | ternary(['http://' + _recon_hosts_ui[0] + ':9888'], []) }}" + s3g_http: "{{ _s3g_hosts_ui | map('regex_replace','^(.*)$','http://\\1:9878') | list }}" + s3g_admin: "{{ _s3g_hosts_ui | map('regex_replace','^(.*)$','http://\\1:19878') | list }}" + +- name: "Service UI Endpoints" + debug: + msg: + - "OM UI: {{ ui_urls.om }}" + - "SCM UI: {{ ui_urls.scm }}" + - "Recon UI: {{ ui_urls.recon }}" + - "S3G HTTP: {{ ui_urls.s3g_http }}" + - "S3G Admin: {{ ui_urls.s3g_admin }}" + run_once: true + +- name: "Export UI endpoints to controller logs directory" + copy: + content: "{{ ui_urls | to_nice_json }}" + dest: "{{ controller_logs_dir }}/ui_urls.json" + mode: "0644" + delegate_to: localhost + run_once: true + when: controller_logs_dir is defined \ No newline at end of file diff --git a/roles/ozone_user/defaults/main.yml b/roles/ozone_user/defaults/main.yml new file mode 100644 index 0000000..e798044 --- /dev/null +++ b/roles/ozone_user/defaults/main.yml @@ -0,0 +1,6 @@ +--- +service_user: "ozone" +service_group: "ozone" +service_shell: "/bin/bash" + + diff --git a/roles/ozone_user/tasks/main.yml b/roles/ozone_user/tasks/main.yml new file mode 100644 index 0000000..7943d48 --- /dev/null +++ b/roles/ozone_user/tasks/main.yml @@ -0,0 +1,33 @@ +--- +- name: "Ensure service group exists" + group: + name: "{{ service_group }}" + state: present + become: true + +- name: "Ensure service user exists" + user: + name: "{{ service_user }}" + group: "{{ service_group }}" + shell: "{{ service_shell }}" + create_home: true + state: present + become: true + +- name: "Unlock service user account" + command: "passwd -u {{ service_user }}" + register: unlock_out + changed_when: unlock_out.rc == 0 + failed_when: false + become: true + +- name: "Ensure home directory permissions" + file: + path: "{{ (service_user == 'root') | ternary('/root', '/home/' + service_user) }}" + state: directory + owner: "{{ (service_user == 'root') | ternary('root', service_user) }}" + group: "{{ (service_user == 'root') | ternary('root', service_user) }}" + mode: "0755" + become: true + + diff --git a/roles/ssh_bootstrap/defaults/main.yml b/roles/ssh_bootstrap/defaults/main.yml new file mode 100644 index 0000000..671be53 --- /dev/null +++ b/roles/ssh_bootstrap/defaults/main.yml @@ -0,0 +1,14 @@ +--- +# Whether to deploy a cluster-wide SSH identity (private key) to the service user +allow_cluster_ssh_key_deploy: false + +# Optional paths on the controller for installer SSH keys +ssh_public_key_path: "" +ssh_private_key_path: "" + +# Target users for authorized_keys installation +authorized_key_users: + - "{{ service_user }}" + - "root" + + diff --git a/roles/ssh_bootstrap/tasks/main.yml b/roles/ssh_bootstrap/tasks/main.yml new file mode 100644 index 0000000..35877d8 --- /dev/null +++ b/roles/ssh_bootstrap/tasks/main.yml @@ -0,0 +1,72 @@ +--- +- name: "Ensure .ssh directory exists for target users" + file: + path: "{{ (item == 'root') | ternary('/root/.ssh', '/home/' + item + '/.ssh') }}" + state: directory + owner: "{{ (item == 'root') | ternary('root', item) }}" + group: "{{ (item == 'root') | ternary('root', item) }}" + mode: "0700" + loop: "{{ authorized_key_users | unique }}" + when: item | length > 0 + become: true + +- name: "Install authorized public key for users (if provided)" + file: + path: "{{ (item == 'root') | ternary('/root/.ssh/authorized_keys', '/home/' + item + '/.ssh/authorized_keys') }}" + state: touch + owner: "{{ (item == 'root') | ternary('root', item) }}" + group: "{{ (item == 'root') | ternary('root', item) }}" + mode: "0600" + loop: "{{ authorized_key_users | unique }}" + when: + - ssh_public_key_path | length > 0 + - item | length > 0 + become: true + +- name: "Append authorized public key for users" + lineinfile: + path: "{{ (item == 'root') | ternary('/root/.ssh/authorized_keys', '/home/' + item + '/.ssh/authorized_keys') }}" + create: yes + line: "{{ lookup('file', ssh_public_key_path) }}" + state: present + insertafter: EOF + loop: "{{ authorized_key_users | unique }}" + when: + - ssh_public_key_path | length > 0 + - item | length > 0 + become: true + +- name: "Deploy cluster SSH private key to service user (opt-in)" + copy: + src: "{{ ssh_private_key_path }}" + dest: "{{ (service_user == 'root') | ternary('/root/.ssh/id_ed25519', '/home/' + service_user + '/.ssh/id_ed25519') }}" + owner: "{{ (service_user == 'root') | ternary('root', service_user) }}" + group: "{{ (service_user == 'root') | ternary('root', service_user) }}" + mode: "0600" + when: + - allow_cluster_ssh_key_deploy | bool + - ssh_private_key_path | length > 0 + become: true + +- name: "Ensure public half exists for deployed private key" + shell: "ssh-keygen -y -f {{ (service_user == 'root') | ternary('/root/.ssh/id_ed25519', '/home/' + service_user + '/.ssh/id_ed25519') }} > {{ (service_user == 'root') | ternary('/root/.ssh/id_ed25519.pub', '/home/' + service_user + '/.ssh/id_ed25519.pub') }}" + args: + creates: "{{ (service_user == 'root') | ternary('/root/.ssh/id_ed25519.pub', '/home/' + service_user + '/.ssh/id_ed25519.pub') }}" + when: allow_cluster_ssh_key_deploy | bool + become: true + +- name: "Add passwordless SSH config for users" + copy: + dest: "{{ (item == 'root') | ternary('/root/.ssh/config', '/home/' + item + '/.ssh/config') }}" + owner: "{{ (item == 'root') | ternary('root', item) }}" + group: "{{ (item == 'root') | ternary('root', item) }}" + mode: "0600" + content: | + Host * + StrictHostKeyChecking no + UserKnownHostsFile /dev/null + loop: "{{ authorized_key_users | unique }}" + when: item | length > 0 + become: true + + --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
