jason810496 commented on code in PR #67153:
URL: https://github.com/apache/airflow/pull/67153#discussion_r3296393401


##########
go-sdk/adr/0001-bundle-packing-options.md:
##########
@@ -0,0 +1,292 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# 1. Post-build bundle-packing options for the Go SDK
+
+Date: 2026-04-30
+
+## Status
+
+Accepted as the option register. The packer-mechanism decision is
+recorded in [ADR 0002](0002-use-go-tool-directive-for-bundle-packer.md):
+Option H (Go 1.24 `tool` directive) for delivery, paired with Option A
+(standalone `airflow-go-pack` binary) and Option D (standardised
+`--dump-bundle-spec` introspection contract).
+
+The container-format assumption running through this ADR — that the
+output is a ZIP archive — is superseded by
+[ADR 0004](0004-self-contained-executable-bundle.md), which embeds the
+source and manifest in a footer appended to the executable. The
+options below still describe valid *packer mechanisms*; only the
+artefact each one writes has changed from a ZIP to a footer-augmented
+executable.
+
+## Context
+
+The executable provider's bundle spec
+([`task-sdk/docs/bundle-spec.rst`](../../task-sdk/docs/bundle-spec.rst))
+defines a deployment artifact as a ZIP archive containing:
+
+1. `airflow-metadata.yaml` declaring `format_version`, `sdk` 
(language/version),
+   `source` (archive-relative path to the DAG source file), `executable`
+   (archive-relative path to the compiled binary), and `dags` (a mapping of
+   `dag_id` to `{tasks: [task_id, ...]}`).
+2. The primary DAG source file, included verbatim.
+3. The compiled native executable, which speaks the coordinator protocol
+   (`--comm=<addr>` / `--logs=<addr>`).
+
+Bundle authors today produce the executable with a plain `go build`
+(see [`go-sdk/example/bundle/Justfile`](../example/bundle/Justfile)). There is
+no SDK-provided way to produce the conforming ZIP, so each author would need
+to hand-roll one.
+
+The bundle binary already exposes a `--bundle-metadata` flag (defined in
+[`bundle/bundlev1/bundlev1server/server.go`](../bundle/bundlev1/bundlev1server/server.go))
+that prints the `BundleInfo{Name, Version}` returned by the author's
+`BundleProvider.GetBundleVersion()`. It does **not** currently invoke
+`RegisterDags`, so it does not yet enumerate `dag_id` / `task_id` for the
+manifest. This is relevant context: the binary itself is the authoritative
+source of dag/task identity at runtime, and the SDK can extend the
+introspection path cheaply.
+
+The user's initial framing was `go build -toolexec`. `-toolexec` wraps each
+toolchain invocation (compile, asm, link) and does not have visibility into
+the final `-o` output path or a single "build finished" hook, so it is a poor
+fit for producing the final ZIP. The options below cover the mechanisms that
+do fit, plus the `-toolexec` variant for completeness.
+
+A packing mechanism has two sub-decisions:
+
+- **Where the packing logic runs.** In the bundle binary itself
+  (self-pack), in a separate SDK CLI, or in build tooling outside the SDK
+  (Makefile/Justfile snippet).
+- **How dag/task IDs reach the manifest.** Runtime introspection of the
+  built binary (call into `RegisterDags` against an in-memory
+  registry recorder), static AST scan of the source file, or
+  hand-written manifest.
+
+The options below combine those two sub-decisions in different ways.
+
+## Options
+
+### Option A: Standalone SDK packer CLI (`airflow-go-pack`)
+
+A new binary under `go-sdk/cmd/airflow-go-pack` that takes
+already-built inputs and writes the ZIP:
+
+```
+airflow-go-pack \
+    --source ./example/bundle/main.go \
+    --executable ./bin/example-dag-bundle \
+    --output ./bin/example.zip
+```
+
+Manifest population: the packer execs the supplied executable with
+`--bundle-metadata` and reads the JSON from stdout to fill `sdk.version`,
+and a new `--dump-dags` (or extended `--bundle-metadata`) flag to enumerate
+`dags`. Source language is hard-coded to `go`; SDK version is read from the
+build info embedded in the binary or from a build-time `-ldflags` value.
+
+- **Pros:** simple, single-purpose binary; works against any binary the user
+  built however they like (CGO, cross-compile, custom `-ldflags`); no
+  coupling to `go build` invocation; trivially callable from `just`,
+  `make`, CI, or `go generate`.
+- **Cons:** two-step UX (`go build` then `airflow-go-pack`); user has to
+  install or `go run` the tool; nothing prevents pack/build mismatch
+  (e.g. packing yesterday's binary).
+
+### Option B: All-in-one SDK CLI with a `build` subcommand
+
+A single SDK CLI (`airflow-go`) with subcommands that wrap `go build` and
+then pack:
+
+```
+airflow-go build ./example/bundle --output ./bin/example.zip
+```
+
+Internally: spawn `go build -o <tmp>/bundle <pkg>`, then run the same
+introspection step as Option A, then write the ZIP.
+
+- **Pros:** single command; no chance of pack/build skew; easy to add
+  related subcommands later (`airflow-go new`, `airflow-go run`,
+  `airflow-go validate`); good defaults for `-ldflags` (e.g.
+  `-X main.bundleVersion=...`) without the author having to know them.
+- **Cons:** the SDK now owns a `go build` wrapper and inherits
+  responsibility for forwarding the long tail of `go build` flags
+  (`-tags`, `-trimpath`, `GOOS`/`GOARCH` env, `-ldflags` passthrough,
+  `-buildvcs`, etc.); harder to integrate with non-trivial existing build
+  systems that already drive `go build` themselves.
+
+### Option C: Self-packing binary (`--pack-bundle <out.zip>`)
+
+Extend `bundlev1server.Serve` so that when the binary is invoked with
+`--pack-bundle <out.zip>`, it builds the ZIP itself: it knows its own
+executable path (`os.Executable()`), its embedded source (via `//go:embed`
+of the DAG source file at build time), and its dag/task list (by
+calling `RegisterDags` against an in-memory recorder). After writing
+the archive, it exits.
+
+- **Pros:** zero extra tools; the binary is fully self-describing; pack
+  output is provably consistent with the binary's runtime behaviour.
+- **Cons:** requires the author's `main` package to embed its own source
+  (`//go:embed main.go` or similar), which is awkward when the DAG is
+  spread across multiple files or the source path is non-obvious;
+  bloats every bundle binary with packing code and an embedded copy of
+  the source; mixes build-time concerns into a runtime entrypoint.
+
+### Option D: Two-phase external introspection (introspection binary + packer)
+
+Same shape as Option A or B, but standardise the introspection contract:
+the SDK guarantees that every bundle binary supports
+`--dump-bundle-spec` (or a richer `--bundle-metadata`) which prints a
+JSON blob containing `sdk.language`, `sdk.version`, and the full `dags`
+mapping. The packer's only job is to combine that JSON, the source
+file path the user passes in, and the binary itself into a ZIP.
+
+This is really a refinement of A/B that fixes the introspection contract
+in the SDK protocol, rather than an independent option, but is worth
+calling out because the shape of the introspection flag is itself a
+decision (single flag vs. several; JSON vs. YAML; pretty vs. compact).
+
+- **Pros:** decouples "how do we enumerate dags" from "how do we ZIP";
+  any future packer (third-party CI plugin, IDE, etc.) can rely on the
+  same contract; trivial to unit-test.
+- **Cons:** locks in a wire format the SDK has to keep stable; slightly
+  more code in the bundle binary than today.
+
+### Option E: Static AST scan, no introspection
+
+Parser-only packer: walk the DAG source AST, find `dagbag.AddDag("X")`
+calls and the `.AddTask(fn)` calls chained off them, and synthesise the
+manifest without running the binary.
+
+- **Pros:** no runtime dependency on the binary (works even if it
+  doesn't build for the host platform, e.g. cross-compiled for Linux on

Review Comment:
   f297561275 Resolve cross-compile in packer with two-build introspection. If 
there's any `target != host` or encounter `exec` not host compatible exe format 
cases, we will build another host binary to retrieve the `airflow-metadata`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to