[PR] Replace the documentation build system with an AsciidoctorJ extension [tinkerpop]

via GitHub Tue, 09 Jun 2026 10:49:40 -0700


Cole-Greer opened a new pull request, #3455:
URL: https://github.com/apache/tinkerpop/pull/3455

## Summary

This PR replaces TinkerPop's legacy shell/AWK documentation preprocessor +
postprocessor pipeline with a Maven-based AsciidoctorJ extension
(tools/tinkerpop-docs). The new extension walks each AsciiDoc book's AST,
executes [gremlin-groovy] code blocks against a long-lived Gremlin Console
subprocess, and renders
the console output as tabbed, syntax-highlighted HTML — producing output
structurally equivalent to the published 3.7.7-SNAPSHOT docs while being easier
to maintain, test, and run.

## Motivation

The old build was a fragile pipeline of bash + awk scripts under
docs/preprocessor/ and docs/postprocessor/ that was hard to test, OS-sensitive
(required GNU coreutils on macOS), silently swallowed Gremlin execution errors,
and depended on a manually configured pseudo-distributed Hadoop cluster. The
replacement
is a single Maven module with unit tests, fail-fast error handling, and a
local-filesystem Hadoop configuration that needs no daemons.

## What changed

New AsciidoctorJ extension (tools/tinkerpop-docs)
- GremlinTreeprocessor — AST walk, block execution, per-graph
initialization, sugar-plugin handling, and multi-line statement grouping.
- GremlinConsole — manages the bin/gremlin.sh subprocess, prompt-based
output capture, and error-prompt detection.
- TabbedHtmlBuilder / GremlinPostprocessor — tabbed HTML output, CodeRay
syntax highlighting (via JRuby), callout/conum rendering, and version
substitution.
- ConsoleRestartHandler / PluginDirectoryRestartHandler — per-book plugin
isolation (see below).
- SPI registration + a docs-specific local-filesystem Hadoop config
(hadoop-conf/core-site.xml).

Orchestration — bin/process-docs.sh rewritten to validate the console/server
distributions, install plugins, start a Gremlin Server and Gephi mock, and
invoke Maven. Supports --dryRun (render without executing).

Per-book plugin isolation — Neo4j 3.4 (Scala 2.11) and Spark (Scala 2.12)
cannot share the console's flat classpath. A :gremlin-docs-plugins-exclude:
section attribute drives a console restart with the conflicting plugin
directories toggled aside, so both the Neo4j and Spark examples render
correctly in the
same run. Plugin dependencies are installed into ext/<plugin>/plugin/ (not
the shared lib/) so they can be isolated, and the toggle is
idempotent/resilient to interrupted builds.

Docs source updates
- Added :gremlin-docs-plugins-exclude: attributes to the neo4j, hadoop,
spark, and gremlin-variants chapters.
- Scoped the Hadoop hdfs.ls() examples to the copied graph file so rendered
docs avoid listing the build machine's home directory.
- Fixed an undefined-variable typo (marko → vMarko) and converted the
Spark-on-YARN recipe to a static example (requires dependency on a live YARN
cluster).
- Rewrote the developer-doc "Documentation Environment" section to describe
the new Maven/AsciidoctorJ build and removed the retired preprocessor
references.

Removed — the entire docs/preprocessor/ and docs/postprocessor/ script trees
(15 files).

## Testing

- 92 unit tests in tools/tinkerpop-docs (console I/O, treeprocessor, tabbed
HTML, postprocessor, dry-run, plugin-directory toggling), plus an integration
fixture exercising gremlin blocks, manual/standalone tabs, existing, errors,
callouts, and version replacement.
- Full bin/process-docs.sh build completes BUILD SUCCESS with execution
errors fatal.
- Output diffed against the published 3.7.7-SNAPSHOT docs across all 8
books: structural metrics (headings, listing blocks, tab sections, callouts)
match within ~2%; zero stacktrace bloat; all differences attributable to
intended source updates, the file:/// vs hdfs:// environment, or
branch-vs-snapshot content
drift.

## Tips for reviewers

I've taken the liberty of redeploying the [3.7.7-SNAPSHOT
docs](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/) from this branch. I
would recommend focusing the review on evaluating the built docs. There are a
few notable differences worth calling out:

- The CSharp tabs now have functioning syntax highlighting ([as seen in the
Basic Gremlin section of the reference
docs](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/reference/#basic-gremlin))
- The [HDFS
examples](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/reference/#_oltp_hadoop_gremlin)
have replaces calls to `hdfs.ls()` with `hdfs.ls('tinkerpop-modern.kryo')`.
This is a minor workaround as the docs build substitutes in the filesystem from
the host machine instead of running a local hadoop cluster. This change is to
avoid dumping existing contents of the hosts home directory. The old format
could be restored by having the docs system internally manage a MiniDFSCluster.
This is a viable fix but I've left it out of scope from this PR to limit
complexity.
- The [OLAP Spark YARN
recipe](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/recipes/#olap-spark-yarn)
has been converted to a static example, it is no longer executed during docs
build.

## Future

The goal of this work was to replace the old docs system with a goal of a
1:1 equivalency in docs output. I think this new extension gives us a better
platform to build future enhancements on the docs.

- For 3.8 and above, it becomes quite trivial to link the gremlin-lang
translators into all of the `gremlin-groovy` examples, and automatically add
tabs for all language variants (excluding groovy-specific examples)
- There is some complexity in the system to load and unload console plugins
depending on needs for each doc book (needed due to conflicting dependencies
between spark and neo4j). This could be ripped out and simplified in master as
neo4j and sparql plugins are no longer necessary.
- I expect we can extend the new asciidoctor plugin to add new features to
the docs, such as improved docs navigation and an integrated search capability.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Replace the documentation build system with an AsciidoctorJ extension [tinkerpop]

Reply via email to