Cole-Greer opened a new pull request, #3455:
URL: https://github.com/apache/tinkerpop/pull/3455

   ## Summary
   
   This PR replaces TinkerPop's legacy shell/AWK documentation preprocessor + 
postprocessor pipeline with a Maven-based AsciidoctorJ extension 
(tools/tinkerpop-docs). The new extension walks each AsciiDoc book's AST, 
executes [gremlin-groovy] code blocks against a long-lived Gremlin Console 
subprocess, and renders
   the console output as tabbed, syntax-highlighted HTML — producing output 
structurally equivalent to the published 3.7.7-SNAPSHOT docs while being easier 
to maintain, test, and run.
   
   ## Motivation
   
   The old build was a fragile pipeline of bash + awk scripts under 
docs/preprocessor/ and docs/postprocessor/ that was hard to test, OS-sensitive 
(required GNU coreutils on macOS), silently swallowed Gremlin execution errors, 
and depended on a manually configured pseudo-distributed Hadoop cluster. The 
replacement
   is a single Maven module with unit tests, fail-fast error handling, and a 
local-filesystem Hadoop configuration that needs no daemons.
   
   ## What changed
   
   New AsciidoctorJ extension (tools/tinkerpop-docs)
   - GremlinTreeprocessor — AST walk, block execution, per-graph 
initialization, sugar-plugin handling, and multi-line statement grouping.
   - GremlinConsole — manages the bin/gremlin.sh subprocess, prompt-based 
output capture, and error-prompt detection.
   - TabbedHtmlBuilder / GremlinPostprocessor — tabbed HTML output, CodeRay 
syntax highlighting (via JRuby), callout/conum rendering, and version 
substitution.
   - ConsoleRestartHandler / PluginDirectoryRestartHandler — per-book plugin 
isolation (see below).
   - SPI registration + a docs-specific local-filesystem Hadoop config 
(hadoop-conf/core-site.xml).
   
   Orchestration — bin/process-docs.sh rewritten to validate the console/server 
distributions, install plugins, start a Gremlin Server and Gephi mock, and 
invoke Maven. Supports --dryRun (render without executing).
   
   Per-book plugin isolation — Neo4j 3.4 (Scala 2.11) and Spark (Scala 2.12) 
cannot share the console's flat classpath. A :gremlin-docs-plugins-exclude: 
section attribute drives a console restart with the conflicting plugin 
directories toggled aside, so both the Neo4j and Spark examples render 
correctly in the 
   same run. Plugin dependencies are installed into ext/<plugin>/plugin/ (not 
the shared lib/) so they can be isolated, and the toggle is 
idempotent/resilient to interrupted builds.
   
   Docs source updates
   - Added :gremlin-docs-plugins-exclude: attributes to the neo4j, hadoop, 
spark, and gremlin-variants chapters.
   - Scoped the Hadoop hdfs.ls() examples to the copied graph file so rendered 
docs avoid listing the build machine's home directory.
   - Fixed an undefined-variable typo (marko → vMarko) and converted the 
Spark-on-YARN recipe to a static example (requires dependency on a live YARN 
cluster).
   - Rewrote the developer-doc "Documentation Environment" section to describe 
the new Maven/AsciidoctorJ build and removed the retired preprocessor 
references.
   
   Removed — the entire docs/preprocessor/ and docs/postprocessor/ script trees 
(15 files).
   
   ## Testing
   
   - 92 unit tests in tools/tinkerpop-docs (console I/O, treeprocessor, tabbed 
HTML, postprocessor, dry-run, plugin-directory toggling), plus an integration 
fixture exercising gremlin blocks, manual/standalone tabs, existing, errors, 
callouts, and version replacement.
   - Full bin/process-docs.sh build completes BUILD SUCCESS with execution 
errors fatal.
   - Output diffed against the published 3.7.7-SNAPSHOT docs across all 8 
books: structural metrics (headings, listing blocks, tab sections, callouts) 
match within ~2%; zero stacktrace bloat; all differences attributable to 
intended source updates, the file:/// vs hdfs:// environment, or 
branch-vs-snapshot content
   drift.
   
   ## Tips for reviewers
   
   I've taken the liberty of redeploying the [3.7.7-SNAPSHOT 
docs](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/) from this branch. I 
would recommend focusing the review on evaluating the built docs. There are a 
few notable differences worth calling out:
   
   - The CSharp tabs now have functioning syntax highlighting ([as seen in the 
Basic Gremlin section of the reference 
docs](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/reference/#basic-gremlin))
   - The [HDFS 
examples](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/reference/#_oltp_hadoop_gremlin)
 have replaces calls to `hdfs.ls()` with `hdfs.ls('tinkerpop-modern.kryo')`. 
This is a minor workaround as the docs build substitutes in the filesystem from 
the host machine instead of running a local hadoop cluster. This change is to 
avoid dumping existing contents of the hosts home directory. The old format 
could be restored by having the docs system internally manage a MiniDFSCluster. 
This is a viable fix but I've left it out of scope from this PR to limit 
complexity.
   - The [OLAP Spark YARN 
recipe](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/recipes/#olap-spark-yarn)
 has been converted to a static example, it is no longer executed during docs 
build.
   
   ## Future
   
   The goal of this work was to replace the old docs system with a goal of a 
1:1 equivalency in docs output. I think this new extension gives us a better 
platform to build future enhancements on the docs.
   
   - For 3.8 and above, it becomes quite trivial to link the gremlin-lang 
translators into all of the `gremlin-groovy` examples, and automatically add 
tabs for all language variants (excluding groovy-specific examples)
   - There is some complexity in the system to load and unload console plugins 
depending on needs for each doc book (needed due to conflicting dependencies 
between spark and neo4j). This could be ripped out and simplified in master as 
neo4j and sparql plugins are no longer necessary.
   - I expect we can extend the new asciidoctor plugin to add new features to 
the docs, such as improved docs navigation and an integrated search capability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to