This is an automated email from the ASF dual-hosted git repository.

cloud-fan pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-4.x by this push:
     new 81718b1ba842 [SPARK-56832][INFRA] Surface fatal javadoc errors in 
unidoc log summary and CI annotations
81718b1ba842 is described below

commit 81718b1ba842a6452f6cfd88c5da3fb417b9d4fb
Author: Wenchen Fan <[email protected]>
AuthorDate: Wed May 13 15:17:54 2026 +0800

    [SPARK-56832][INFRA] Surface fatal javadoc errors in unidoc log summary and 
CI annotations
    
    ### What changes were proposed in this pull request?
    
    After the noise filters from #55605, the Documentation generation CI log is 
around 4K lines on a failure run. The two-line per-file `error: reference not 
found` diagnostics are still buried in the middle of the log, and the GitHub 
Actions check panel for a failed doc-gen job only surfaces `Process completed 
with exit code 1`. Reviewers end up scrolling the raw log to find what actually 
broke.
    
    This PR is purely additive in `docs/_plugins/build_api_docs.rb` -- no 
existing log lines are dropped. After the unidoc pipe closes:
    
    1. A trailing `Fatal javadoc errors (N):` block is printed, listing each 
captured diagnostic with file, line, and message.
    2. One `::error file=<path>,line=<line>,title=javadoc::<msg>` GitHub 
Actions workflow command is emitted per diagnostic, so they appear as inline 
annotations on the PR check panel instead of as a single opaque `exit code 1`.
    
    Diagnostics are captured strictly within the Standard Doclet phase 
bracketed by `Building tree for all the packages and classes...` and `Building 
index for all classes...`, which is where doclint emits the build-failing 
diagnostics that count toward javadoc's exit code. Source-loading `error:` 
chatter outside that window is excluded -- it's already non-fatal and matches 
what javadoc's own `N errors` summary line counts.
    
    As a self-check, the captured count is compared against javadoc's own `N 
errors` summary line. If they diverge -- e.g. because a future JDK changes the 
Standard Doclet phase wording -- a `::warning::` workflow command is emitted so 
the drift is surfaced without silently masking real failures.
    
    ### Why are the changes needed?
    
    PR #55605 made the doc-gen log small enough to read, but the failure path 
is still discoverable only via grep. The per-file diagnostics emitted by 
doclint are the actionable content; promoting them to the PR check panel and a 
clearly delimited summary block makes a doc-gen failure self-explanatory 
without leaving the PR.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    End-to-end on this branch with deliberately broken references planted in 
two code paths (mirroring the test pattern from PR #55605):
    
    - `ColumnarMap.java` (real Java source): `{link 
org.apache.spark.deliberately.NoSuchClass}` and `{link 
ColumnVector#nonExistentMethod()}`.
    - `Partition.scala` (Scala source via genjavadoc): `[[Partition.index]]` -- 
the `.`-separator case that javadoc treats as inner-class lookup.
    
    The Documentation generation job will fail with the expected `Fatal javadoc 
errors` summary block in the log and per-file inline annotations on this PR's 
check panel. The plant commit will be dropped before this PR is taken out of 
draft.
    
    The state machine was also exercised locally against a captured log from a 
prior failing doc-gen run; the captured fatal count matches javadoc's `N 
errors` summary line.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude (Anthropic)
    
    Closes #55814 from cloud-fan/unidoc-fatal-summary.
    
    Authored-by: Wenchen Fan <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
    (cherry picked from commit 12b2595277e8dcafe6f1151744a24228dc04f701)
    Signed-off-by: Wenchen Fan <[email protected]>
---
 docs/_plugins/build_api_docs.rb | 88 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/docs/_plugins/build_api_docs.rb b/docs/_plugins/build_api_docs.rb
index 1ef80bfaf09a..429cef5aa026 100644
--- a/docs/_plugins/build_api_docs.rb
+++ b/docs/_plugins/build_api_docs.rb
@@ -132,7 +132,7 @@ def build_spark_scala_and_java_docs_if_necessary
   command = "build/sbt -Pkinesis-asl unidoc"
   puts "Running '#{command}'..."
 
-  # Two filter passes on the unidoc output:
+  # Two filter passes on the unidoc output, plus an additive fatal-error 
summary:
   #
   # 1. Genjavadoc-stub diagnostic blocks (~28 `[error]` lines on stubs under
   #    `target/java/`, plus 3-5 continuation lines each). Inert because
@@ -146,6 +146,18 @@ def build_spark_scala_and_java_docs_if_necessary
   #    per-file `error: reference not found` diagnostics) but carry no signal
   #    of their own. Suppressing them brings the visible log from ~17K to ~5K
   #    lines on a typical run while leaving every diagnostic untouched.
+  #
+  # 3. Fatal-error summary (additive, drops no log lines). The filtered log is
+  #    still ~4K lines and most `error:` text in it is non-fatal source-loading
+  #    chatter, so the build-failing diagnostics are hard to spot. After the
+  #    pipe closes, we print a `Fatal javadoc errors (N): ...` block and emit
+  #    `::error file=,line=::` GitHub Actions annotations so they surface in 
the
+  #    PR check panel. Captured strictly within the Standard Doclet phase
+  #    bracketed by `Building tree for all the packages and classes...` and
+  #    `Building index for all classes...`, which is where doclint diagnostics
+  #    are emitted -- this matches what javadoc counts toward exit code 1.
+  #    Self-checked against javadoc's own `N errors` summary line; a mismatch
+  #    emits a `::warning::` so future phase-marker drift is visible.
   ansi = /\e\[[0-9;]*[A-Za-z]/
   stub_header = %r{
     \[(?:error|warn)\]\s+
@@ -167,10 +179,51 @@ def build_spark_scala_and_java_docs_if_necessary
      |Generating\s+\S+\.html
     )
   }x
+
+  # Doclint phase tracking for the trailing summary. Standard Doclet bookends 
the
+  # phase that produces build-failing diagnostics with these marker lines; any
+  # `error:` outside this window is source-loading noise that does not 
contribute
+  # to javadoc's exit code. The summary below captures only the fatal ones and
+  # re-emits them as GitHub Actions annotations so they surface in the PR check
+  # panel instead of being buried in a 4K-line log.
+  doclint_start   = 
%r{\bBuilding\s+tree\s+for\s+all\s+the\s+packages\s+and\s+classes\b}
+  doclint_end     = %r{\bBuilding\s+index\s+for\s+all\s+classes\b}
+  doclint_diag    = 
%r{\A\[warn\]\s+(?<path>\S+):(?<lineno>\d+)(?::\d+)?:\s+error:\s+(?<msg>.+?)\s*\z}
+  doclint_cont    = 
%r{\A\[warn\]\s(?!\S+:\d+(?::\d+)?:\s+error:)(?<content>.*?)\s*\z}
+  doclint_summary = %r{\A\[warn\]\s+(?<count>[\d,]+)\s+errors?\s*\z}
+
   in_stub = false
+  in_doclint = false
+  fatal_diagnostics = []
+  pending_context_lines = 0  # snippet + caret lines that follow each diag 
header
+  reported_error_count = nil
+
   IO.popen("#{command} 2>&1", 'r') do |pipe|
     pipe.each_line do |line|
       plain = line.gsub(ansi, '')
+
+      if plain =~ doclint_start
+        in_doclint = true
+      elsif in_doclint && plain =~ doclint_end
+        in_doclint = false
+        pending_context_lines = 0
+      end
+
+      if in_doclint && (m = plain.match(doclint_diag))
+        fatal_diagnostics << {
+          path: m[:path], line: m[:lineno], msg: m[:msg], context: []
+        }
+        pending_context_lines = 2
+      elsif in_doclint && pending_context_lines > 0 &&
+            (m = plain.match(doclint_cont)) && !fatal_diagnostics.empty?
+        fatal_diagnostics.last[:context] << m[:content]
+        pending_context_lines -= 1
+      end
+
+      if reported_error_count.nil? && (m = plain.match(doclint_summary))
+        reported_error_count = m[:count].delete(',').to_i
+      end
+
       if plain =~ verbose_line
         in_stub = false
         # suppress -verbose progress line
@@ -185,6 +238,39 @@ def build_spark_scala_and_java_docs_if_necessary
       end
     end
   end
+
+  unless fatal_diagnostics.empty?
+    bar = "=" * 72
+    puts ""
+    puts bar
+    puts "Fatal javadoc errors (#{fatal_diagnostics.size}):"
+    puts bar
+    fatal_diagnostics.each_with_index do |d, i|
+      puts "  #{i + 1}. #{d[:path]}:#{d[:line]}: #{d[:msg]}"
+      d[:context].each { |c| puts "       #{c}" }
+    end
+    puts bar
+    puts ""
+
+    # GitHub Actions inline annotations. `%`, `\r`, `\n` require URL-style
+    # escaping per the workflow command spec; newlines render as multiple
+    # lines inside the annotation, so the source snippet and caret display
+    # under the error message in the PR check panel.
+    project_root = SPARK_PROJECT_ROOT + '/'
+    fatal_diagnostics.each do |d|
+      rel = d[:path].start_with?(project_root) ? 
d[:path][project_root.length..] : d[:path]
+      full = ([d[:msg]] + d[:context]).join("\n")
+      enc = full.gsub(/[%\r\n]/, '%' => '%25', "\r" => '%0D', "\n" => '%0A')
+      puts "::error file=#{rel},line=#{d[:line]},title=javadoc::#{enc}"
+    end
+  end
+
+  if reported_error_count && reported_error_count != fatal_diagnostics.size
+    puts "::warning::Javadoc reported #{reported_error_count} errors but " \
+         "build_api_docs.rb captured #{fatal_diagnostics.size}. The doclint " \
+         "phase markers may have shifted; please update build_api_docs.rb."
+  end
+
   raise("Unidoc generation failed") unless $?.success?
 end
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to