Recent AI reviews have lots of extraneous warnings. Reviews were misclassified in two ways: findings from a stale patch revision were reported as current, and classify_review() pattern-matched prose so a "### Errors" header above a discarded item counted as an error.
Require the model to return a single JSON object where each error and warning quotes one line verbatim from the patch. Drop findings whose evidence or file is not in the diff into a separate unverified section. Derive the exit code from verified counts only. Render text/markdown/html locally from the findings. Stamp output with git patch-id --stable and a final "X-AI-Review:" verdict line for CI to parse. Retry once on invalid JSON, then report a contract violation at warning level. Add --replay for offline testing and tests/run-tests.sh. Signed-off-by: Stephen Hemminger <[email protected]> --- AGENTS.md | 52 ++- devtools/ai/review-patch.py | 661 ++++++++++++++++++++++++++---------- 2 files changed, 521 insertions(+), 192 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index af9a7e0772..47a4636307 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1836,20 +1836,58 @@ devtools/get-maintainer.sh <patch-file> --- -# Response Format +# Output Contract -When you identify an issue: -1. **State the problem** (1 sentence) -2. **Why it matters** (1 sentence, only if not obvious) -3. **Suggested fix** (code snippet or specific action) +**Deliberation is private.** Work through candidate findings, trace error +paths, and apply the final checks below BEFORE writing any output. The +output must contain ONLY findings that survived. Never output: +- candidate findings followed by "Discard this item" or "(Correction: ...)" +- reasoning such as "Wait, reviewing the code more carefully..." +- section headers for severities that have no findings +- prose verdicts ("No issues found" prose, "all patches are correct") -Example: This could panic if the string is NULL. +The output is a single JSON object - no markdown fences, no text before or +after it: + +``` +{ + "summary": "one line", + "errors": [ + {"issue": "what is wrong", "location": "file:line", + "evidence": "one line copied exactly from the patch", + "suggestion": "specific fix"} + ], + "warnings": [same fields as errors], + "info": [{"issue": "...", "location": "file:line", "suggestion": "..."}] +} +``` + +**Evidence is mandatory for errors and warnings.** Copy one line verbatim +from the patch (the added `+` line for new code). For missing-cleanup or +missing-check findings, quote the line that allocates, opens, or calls. +The harness verifies evidence against the patch under review: findings +whose evidence does not appear in the patch are discarded as unverifiable +(this catches reviews of a stale patch revision and hallucinated line +references), so quote precisely. + +The `issue` field should state the problem in one sentence, add one +sentence on why it matters only if not obvious, and `suggestion` should be +a specific fix. + +If there are no findings, the entire output is: + +``` +{"summary": "No issues found.", "errors": [], "warnings": [], "info": []} +``` --- ## FINAL CHECK BEFORE SUBMITTING REVIEW -Before outputting your review, do two separate passes: +Before outputting your review, do two separate passes. +Both passes are private deliberation: +the JSON output contains only the findings that survive them, +with no trace of items that were considered and removed. ### Pass 1: Verify correctness bugs are included diff --git a/devtools/ai/review-patch.py b/devtools/ai/review-patch.py index 52601ac156..6623b3d8c2 100755 --- a/devtools/ai/review-patch.py +++ b/devtools/ai/review-patch.py @@ -15,8 +15,10 @@ import subprocess import sys import tempfile +from dataclasses import dataclass, field from datetime import date from email.message import EmailMessage +from html import escape from pathlib import Path from typing import Any, Iterator @@ -66,37 +68,31 @@ - Copyright years should reflect when the code was originally written - Be conservative: reject changes that aren't clearly bug fixes""" -FORMAT_INSTRUCTIONS = { - "text": """Provide your review in plain text format.""", - "markdown": """Provide your review in Markdown format with: -- Headers (##) for each severity level (Errors, Warnings, Info) -- Bullet points for individual issues -- Code blocks (```) for code references -- Bold (**) for emphasis on key points""", - "html": """Provide your review in HTML format with: -- <h2> tags for each severity level (Errors, Warnings, Info) -- <ul>/<li> for individual issues -- <pre><code> for code references -- <strong> for emphasis on key points -- Use appropriate semantic HTML tags -- Do NOT include <html>, <head>, or <body> tags - just the content""", - "json": """Provide your review in JSON format with this structure: +REVIEW_JSON_INSTRUCTION = """\ +Respond with a single JSON object and nothing else - no markdown fences, \ +no prose before or after, no section headers: { - "summary": "Brief one-line summary of the review", + "summary": "one line", "errors": [ - {"issue": "description", "location": "file:line", "suggestion": "fix"} + {"issue": "what is wrong", "location": "file:line", + "evidence": "one line copied exactly from the patch", + "suggestion": "specific fix"} ], - "warnings": [ - {"issue": "description", "location": "file:line", "suggestion": "fix"} - ], - "info": [ - {"issue": "description", "location": "file:line", "suggestion": "fix"} - ], - "passed_checks": ["list of checks that passed"], - "overall_status": "PASS|WARN|FAIL" -} -Output ONLY valid JSON, no markdown code fences or other text.""", + "warnings": [same fields as errors], + "info": [{"issue": "...", "location": "file:line", "suggestion": "..."}] } +Rules: +- Deliberation is private. Output ONLY findings that survive your final + checks; never output candidate findings, "Discard this item", or + corrections. +- Empty severities are empty arrays, never prose. +- "evidence" must be copied verbatim from the patch (the added '+' line for + new code). For missing-cleanup or missing-check findings, quote the line + that allocates or calls. Findings whose evidence does not appear in the + patch are discarded by the harness as unverifiable. +- If there are no findings: + {"summary": "No issues found.", "errors": [], "warnings": [], "info": []} +""" USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ against the AGENTS.md guidelines. Focus on: @@ -119,57 +115,213 @@ EXIT_WARNINGS = 2 EXIT_ERRORS = 3 +# Appended to the prompt when the first response was not valid JSON. +RETRY_NOTE = ( + "\n\nREMINDER: the previous response was not a single valid JSON " + "object. Respond again with ONLY the JSON object in the required schema." +) -def classify_review(review_text: str, output_format: str) -> int: - """Classify review result and return appropriate exit code. - Returns: - 0 - clean (no errors or warnings) - 2 - warnings found (no errors) - 3 - errors found - """ - has_errors = False - has_warnings = False +@dataclass +class ReviewSection: + """One reviewed unit (whole patch file, one patch, or one chunk).""" + + label: str + data: dict[str, Any] | None = None + unverified: list[dict[str, Any]] = field(default_factory=list) + raw: str | None = None + + @property + def contract_violation(self) -> bool: + """True when the model response could not be parsed as JSON.""" + return self.data is None - if output_format == "json": + +def extract_json(text: str) -> dict[str, Any] | None: + """Parse a JSON object from model output, tolerating code fences.""" + candidate = text.strip() + if candidate.startswith("```"): + candidate = re.sub(r"^```[a-zA-Z]*\n", "", candidate) + candidate = re.sub(r"\n```\s*$", "", candidate) + try: + data = json.loads(candidate) + if isinstance(data, dict): + return data + except json.JSONDecodeError: + pass + # Fall back to the outermost braces + first = candidate.find("{") + last = candidate.rfind("}") + if first != -1 and last > first: try: - data = json.loads(review_text) - if data.get("errors"): - has_errors = True - if data.get("warnings"): - has_warnings = True - status = data.get("overall_status", "").upper() - if status == "FAIL": - has_errors = True - elif status == "WARN": - has_warnings = True - except (json.JSONDecodeError, AttributeError): - pass # Fall through to text scanning - - if not has_errors and not has_warnings: - # Scan review text for severity indicators. - # Match section headers and inline markers across text/markdown/html. - for line in review_text.splitlines(): - stripped = line.strip().lower() - # Skip quoted patch context lines - if stripped.startswith(">") or stripped.startswith("diff --git"): + data = json.loads(candidate[first : last + 1]) + if isinstance(data, dict): + return data + except json.JSONDecodeError: + pass + return None + + +def get_patch_ids(patch_content: str) -> list[str]: + """Return stable patch ids for the content under review. + + Uses `git patch-id --stable`, which is invariant to format-patch + regeneration (timestamps, commit hashes, version prefixes) but changes + whenever the diff content changes. This pins a review to the exact + revision it reviewed. Returns an empty list if git is unavailable. + """ + try: + result = subprocess.run( + ["git", "patch-id", "--stable"], + input=patch_content, + capture_output=True, + text=True, + check=True, + ) + except (subprocess.CalledProcessError, FileNotFoundError): + return [] + ids = [] + for line in result.stdout.splitlines(): + fields = line.split() + if fields: + ids.append(fields[0]) + return ids + + +def _normalize_line(line: str) -> str: + """Collapse whitespace; drop a leading diff marker if present.""" + if line[:1] in "+- ": + line = line[1:] + return " ".join(line.split()) + + +def build_patch_line_index(patch_content: str) -> set[str]: + """Set of normalized lines appearing in the patch.""" + index = set() + for line in patch_content.splitlines(): + norm = _normalize_line(line) + if norm: + index.add(norm) + return index + + +def finding_is_verified( + finding: dict[str, Any], line_index: set[str], patch_content: str +) -> bool: + """Check a finding's evidence and location against the patch content. + + Guards against stale reviews (findings quoted from a different revision + of the patch) and hallucinated line references: every evidence line must + appear in the patch, and the file named in location must be part of it. + """ + evidence = str(finding.get("evidence", "")).strip() + if not evidence: + return False + for line in evidence.splitlines(): + norm = _normalize_line(line) + if norm and norm not in line_index: + return False + location = str(finding.get("location", "")) + fname = location.rsplit(":", 1)[0].strip() + if fname: + # Lenient: the model may shorten the path, so match the basename + base = fname.rsplit("/", 1)[-1] + if base and base not in patch_content: + return False + return True + + +def verify_findings( + data: dict[str, Any], patch_content: str +) -> tuple[dict[str, Any], list[dict[str, Any]]]: + """Split errors/warnings into verified and unverified findings. + + Info findings pass through unchecked; they are often general + suggestions with no specific line to quote. + """ + line_index = build_patch_line_index(patch_content) + unverified = [] + verified = dict(data) + for severity in ("errors", "warnings"): + kept = [] + for finding in data.get(severity) or []: + if not isinstance(finding, dict): continue - if re.match(r"^(#{1,3}\s+)?(\*{0,2})error", stripped) or re.match( - r"^<h[1-3]>\s*error", stripped - ): - has_errors = True - elif re.match(r"^(#{1,3}\s+)?(\*{0,2})warning", stripped) or re.match( - r"^<h[1-3]>\s*warning", stripped - ): - has_warnings = True + if finding_is_verified(finding, line_index, patch_content): + kept.append(finding) + else: + dropped = dict(finding) + dropped["severity"] = severity[:-1] + unverified.append(dropped) + verified[severity] = kept + verified["info"] = [f for f in (data.get("info") or []) if isinstance(f, dict)] + return verified, unverified + + +def process_review_text( + review_text: str, patch_content: str, label: str +) -> ReviewSection: + """Parse and verify one model response into a ReviewSection.""" + data = extract_json(review_text) + if data is None: + return ReviewSection(label=label, raw=review_text) + verified, unverified = verify_findings(data, patch_content) + return ReviewSection(label=label, data=verified, unverified=unverified) + + +def section_counts(sections: list[ReviewSection]) -> dict[str, int]: + """Aggregate finding counts across sections.""" + counts = { + "errors": 0, + "warnings": 0, + "info": 0, + "unverified": 0, + "violations": 0, + } + for sec in sections: + if sec.contract_violation: + counts["violations"] += 1 + continue + counts["errors"] += len(sec.data.get("errors", [])) + counts["warnings"] += len(sec.data.get("warnings", [])) + counts["info"] += len(sec.data.get("info", [])) + counts["unverified"] += len(sec.unverified) + return counts + + +def classify_review(counts: dict[str, int]) -> int: + """Exit code from verified finding counts. + + Status is derived ONLY from the parsed, verified findings - never from + pattern matching on prose (a "### Errors" header above "None found" + must not trip the classifier). - if has_errors: + Returns: + 0 - clean (no errors or warnings) + 2 - warnings, unverified findings, or output-contract violations + 3 - errors found + """ + if counts["errors"]: return EXIT_ERRORS - if has_warnings: + if counts["warnings"] or counts["unverified"] or counts["violations"]: return EXIT_WARNINGS return EXIT_CLEAN +def summary_trailer(counts: dict[str, int], patch_ids: list[str]) -> str: + """One-line machine-readable verdict for CI to parse. + + CI labelers should parse this line (or the JSON output), not the prose. + """ + ids = ",".join(pid[:16] for pid in patch_ids) or "unknown" + return ( + f"X-AI-Review: errors={counts['errors']} " + f"warnings={counts['warnings']} info={counts['info']} " + f"unverified={counts['unverified']} " + f"contract-violations={counts['violations']} patch-id={ids}" + ) + + def is_lts_release(release: str | None) -> bool: """Check if a release is an LTS release. @@ -413,34 +565,82 @@ def get_summary_prompt() -> str: """ -def format_combined_reviews( - reviews: list[tuple[str, str]], output_format: str, patch_name: str -) -> str: - """Combine multiple chunk/patch reviews into a single output.""" - if output_format == "json": - combined = { - "patch_file": patch_name, - "sections": [ - {"label": label, "review": review} for label, review in reviews - ], - } - return json.dumps(combined, indent=2) - elif output_format == "html": - sections = [] - for label, review in reviews: - sections.append(f"<h2>{label}</h2>\n{review}") - return "\n<hr>\n".join(sections) - elif output_format == "markdown": - sections = [] - for label, review in reviews: - sections.append(f"## {label}\n\n{review}") - return "\n\n---\n\n".join(sections) - else: # text - sections = [] - for label, review in reviews: - sections.append(f"=== {label} ===\n\n{review}") - separator = "\n\n" + "=" * 60 + "\n\n" - return separator.join(sections) +def _finding_lines(num: int, finding: dict[str, Any]) -> list[str]: + """Render one finding as indented text lines.""" + lines = [f" {num}. {finding.get('location') or 'unspecified location'}"] + issue = str(finding.get("issue", "")).strip() + if issue: + lines.append(f" {issue}") + for ev_line in str(finding.get("evidence", "")).strip().splitlines(): + lines.append(f" > {ev_line}") + suggestion = str(finding.get("suggestion", "")).strip() + if suggestion: + lines.append(f" Fix: {suggestion}") + return lines + + +def render_section_text(sec: ReviewSection) -> str: + """Render one ReviewSection as plain text.""" + if sec.contract_violation: + return ( + "Output contract violation: the model response was not a valid " + "JSON review.\nRaw response follows (unparsed, counted as a " + f"warning):\n\n{sec.raw}" + ) + lines: list[str] = [] + summary = str(sec.data.get("summary", "")).strip() + if summary: + lines.append(summary) + titles = (("errors", "Errors"), ("warnings", "Warnings"), ("info", "Info")) + for key, title in titles: + findings = sec.data.get(key, []) + if not findings: + continue + lines.append("") + lines.append(f"{title}:") + for i, finding in enumerate(findings, 1): + lines.extend(_finding_lines(i, finding)) + if sec.unverified: + lines.append("") + lines.append( + "Unverified findings (evidence not found in this patch " + "revision; possibly from a stale review - not counted):" + ) + for i, finding in enumerate(sec.unverified, 1): + lines.extend(_finding_lines(i, finding)) + if len(lines) <= 1 and not sec.unverified: + return summary or "No issues found." + return "\n".join(lines) + + +def render_sections(sections: list[ReviewSection], output_format: str) -> str: + """Render all sections in the requested format. + + Markdown and HTML are simple wrappings of the text rendering; findings + content comes from the verified structure in all formats. + """ + parts = [] + for sec in sections: + body = render_section_text(sec) + if output_format == "markdown": + if len(sections) > 1: + parts.append(f"## {sec.label}\n\n```\n{body}\n```") + else: + parts.append(f"```\n{body}\n```") + elif output_format == "html": + title = f"<h2>{escape(sec.label)}</h2>\n" if len(sections) > 1 else "" + parts.append(f"{title}<pre>{escape(body)}</pre>") + else: # text + if len(sections) > 1: + parts.append(f"=== {sec.label} ===\n\n{body}") + else: + parts.append(body) + if output_format == "markdown": + return "\n\n---\n\n".join(parts) + if output_format == "html": + return "\n<hr>\n".join(parts) + separator = "\n\n" + "=" * 60 + "\n\n" + return separator.join(parts) def build_system_prompt(review_date: str, release: str | None) -> str: @@ -466,10 +666,10 @@ def build_anthropic_request( agents_content: str, patch_content: str, patch_name: str, - output_format: str = "text", + retry_note: str = "", ) -> dict[str, Any]: """Build request payload for Anthropic API.""" - format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + format_instruction = REVIEW_JSON_INSTRUCTION + retry_note user_prompt = USER_PROMPT.format( patch_name=patch_name, format_instruction=format_instruction ) @@ -500,10 +700,10 @@ def build_openai_request( agents_content: str, patch_content: str, patch_name: str, - output_format: str = "text", + retry_note: str = "", ) -> dict[str, Any]: """Build request payload for OpenAI-compatible APIs.""" - format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + format_instruction = REVIEW_JSON_INSTRUCTION + retry_note user_prompt = USER_PROMPT.format( patch_name=patch_name, format_instruction=format_instruction ) @@ -527,10 +727,10 @@ def build_google_request( agents_content: str, patch_content: str, patch_name: str, - output_format: str = "text", + retry_note: str = "", ) -> dict[str, Any]: """Build request payload for Google Gemini API.""" - format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + format_instruction = REVIEW_JSON_INSTRUCTION + retry_note user_prompt = USER_PROMPT.format( patch_name=patch_name, format_instruction=format_instruction ) @@ -560,7 +760,7 @@ def call_api( agents_content: str, patch_content: str, patch_name: str, - output_format: str = "text", + retry_note: str = "", verbose: bool = False, timeout: int = 300, ) -> tuple[str, TokenUsage]: @@ -573,7 +773,7 @@ def call_api( agents_content, patch_content, patch_name, - output_format, + retry_note, ) elif provider == "google": request_data = build_google_request( @@ -582,7 +782,7 @@ def call_api( agents_content, patch_content, patch_name, - output_format, + retry_note, ) else: # openai, xai request_data = build_openai_request( @@ -592,7 +792,7 @@ def call_api( agents_content, patch_content, patch_name, - output_format, + retry_note, ) return send_request( provider, @@ -604,6 +804,59 @@ def call_api( ) +def reviewed_section( + provider: str, + api_key: str, + model: str, + max_tokens: int, + system_prompt: str, + agents_content: str, + content: str, + name: str, + label: str, + verbose: bool, + timeout: int, + usage: TokenUsage, +) -> ReviewSection: + """Run one review call, retrying once if the response is not valid JSON.""" + review_text, call_usage = call_api( + provider, + api_key, + model, + max_tokens, + system_prompt, + agents_content, + content, + name, + "", + verbose, + timeout, + ) + usage.add(call_usage) + section = process_review_text(review_text, content, label) + if section.contract_violation: + print( + f"{label}: response was not valid JSON, retrying once...", + file=sys.stderr, + ) + review_text, call_usage = call_api( + provider, + api_key, + model, + max_tokens, + system_prompt, + agents_content, + content, + name, + RETRY_NOTE, + verbose, + timeout, + ) + usage.add(call_usage) + section = process_review_text(review_text, content, label) + return section + + def get_last_message_id(patch_content: str) -> str | None: """Extract Message-ID from the last patch in an mbox.""" msg_ids = re.findall( @@ -772,10 +1025,23 @@ def main() -> None: Use --show-tokens (or -v/--verbose) to print a token usage summary on stderr after the run. Off by default. +Review Integrity: + The model must return structured JSON findings; text/markdown/html are + rendered locally from the verified findings. Each error/warning must + quote an exact line from the patch as evidence; findings whose evidence + is not present in the patch (e.g. review of a stale revision) are + reported as unverified and do not affect the error count. Every output + carries the `git patch-id --stable` of the reviewed content and a final + "X-AI-Review: ..." verdict line for CI to parse. + +Testing: + %(prog)s --replay saved-response.txt patch.patch # no API call + Exit Codes: 0 - Clean review (no errors or warnings) 1 - Operational failure (missing API key, file not found, etc.) - 2 - Review found warnings (no errors) + 2 - Review found warnings, unverified findings, or the model violated + the output contract (no errors) 3 - Review found errors """, ) @@ -840,6 +1106,12 @@ def main() -> None: metavar="SECONDS", help="API request timeout in seconds (default: 300)", ) + parser.add_argument( + "--replay", + metavar="FILE", + help="Testing aid: process a saved model response instead of " + "calling the API (reviews the patch file as a single unit)", + ) # Date and release options parser.add_argument( @@ -930,9 +1202,9 @@ def main() -> None: config = PROVIDERS[args.provider] model = args.model or config["default_model"] - # Get API key - api_key = os.environ.get(config["env_var"]) - if not api_key: + # Get API key (not needed when replaying a saved response) + api_key = os.environ.get(config["env_var"], "") + if not api_key and not args.replay: error(f"{config['env_var']} environment variable not set") # Validate files @@ -1000,9 +1272,11 @@ def main() -> None: file=sys.stderr, ) + # Reviewed sections accumulate here in all modes + sections: list[ReviewSection] = [] + # Handle --split-patches mode - review_text = "" - if args.split_patches: + if args.split_patches and not args.replay: patches = split_mbox_patches(patch_content) total_patches = len(patches) @@ -1034,37 +1308,32 @@ def main() -> None: ) # Review each patch separately - all_reviews = [] for i, patch in enumerate(patches, patch_start or 1): patch_label = f"Patch {i}/{total_patches}" print(f"\nReviewing {patch_label}...", file=sys.stderr) - review_text, call_usage = call_api( - args.provider, - api_key, - model, - args.tokens, - system_prompt, - agents_content, - patch, - f"{patch_name} ({patch_label})", - args.output_format, - args.verbose, - args.timeout, + sections.append( + reviewed_section( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch, + f"{patch_name} ({patch_label})", + patch_label, + args.verbose, + args.timeout, + total_usage, + ) ) - total_usage.add(call_usage) - all_reviews.append((patch_label, review_text)) - - # Combine reviews - review_text = format_combined_reviews( - all_reviews, args.output_format, patch_name - ) # Skip the normal API call estimated_tokens = 0 # Bypass size check since we've already processed - # Check if content is too large - is_large = estimated_tokens > max_input_tokens + # Check if content is too large (replay skips API size limits) + is_large = estimated_tokens > max_input_tokens and not args.replay if is_large: print( @@ -1102,33 +1371,28 @@ def main() -> None: patch_content, _ = truncate_content(patch_content, max_input_tokens) elif args.large_file == "chunk": print("Processing in chunks...", file=sys.stderr) - all_reviews = [] for chunk, chunk_num, total_chunks in chunk_content( patch_content, max_input_tokens ): chunk_label = f"Chunk {chunk_num}/{total_chunks}" print(f"Reviewing {chunk_label}...", file=sys.stderr) - review_text, call_usage = call_api( - args.provider, - api_key, - model, - args.tokens, - system_prompt, - agents_content, - chunk, - f"{patch_name} ({chunk_label})", - args.output_format, - args.verbose, - args.timeout, + sections.append( + reviewed_section( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + chunk, + f"{patch_name} ({chunk_label})", + chunk_label, + args.verbose, + args.timeout, + total_usage, + ) ) - total_usage.add(call_usage) - all_reviews.append((chunk_label, review_text)) - - # Combine chunk reviews - review_text = format_combined_reviews( - all_reviews, args.output_format, patch_name - ) # Skip the normal single API call below estimated_tokens = 0 @@ -1160,37 +1424,45 @@ def main() -> None: print(f"From: {from_addr}", file=sys.stderr) print("===============", file=sys.stderr) - # Call API (unless already processed via chunks/split) - if estimated_tokens > 0: # Not already processed - review_text, call_usage = call_api( - args.provider, - api_key, - model, - args.tokens, - system_prompt, - agents_content, - patch_content, - patch_name, - args.output_format, - args.verbose, - args.timeout, + # Review the whole file (unless already processed via chunks/split) + if args.replay: + replay_path = Path(args.replay) + if not replay_path.exists(): + error(f"Replay file not found: {args.replay}") + review_text = replay_path.read_text(encoding="utf-8", errors="replace") + sections = [process_review_text(review_text, patch_content, patch_name)] + elif not sections: + sections.append( + reviewed_section( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + patch_name, + args.verbose, + args.timeout, + total_usage, + ) ) - total_usage.add(call_usage) - if not review_text: + if not sections: error(f"No response received from {args.provider}") + # Verify identity of the reviewed content and derive the verdict + patch_ids = get_patch_ids(patch_content) + counts = section_counts(sections) + trailer = summary_trailer(counts, patch_ids) + review_body = render_sections(sections, args.output_format) + # Format output based on requested format provider_name = config["name"] + patch_id_label = ",".join(pid[:16] for pid in patch_ids) or "unknown" if args.output_format == "json": - # For JSON, try to parse and add metadata - try: - review_data = json.loads(review_text) - except json.JSONDecodeError: - # If AI didn't return valid JSON, wrap the text - review_data = {"raw_review": review_text} - usage_data = { "api_calls": total_usage.api_calls, "input_tokens": total_usage.input_tokens, @@ -1205,19 +1477,29 @@ def main() -> None: output_data = { "metadata": { "patch_file": patch_name, + "patch_ids": patch_ids, "provider": args.provider, "provider_name": provider_name, "model": model, "review_date": review_date, "target_release": args.release, "is_lts": is_lts_release(args.release) if args.release else False, + "counts": counts, "token_usage": usage_data, }, - "review": review_data, + "sections": [ + { + "label": sec.label, + "contract_violation": sec.contract_violation, + "review": sec.data, + "unverified_findings": sec.unverified, + **({"raw_response": sec.raw} if sec.raw else {}), + } + for sec in sections + ], } output_text = json.dumps(output_data, indent=2) elif args.output_format == "html": - # Wrap HTML content with header release_info = "" if args.release: lts_badge = " (LTS)" if is_lts_release(args.release) else "" @@ -1226,8 +1508,9 @@ def main() -> None: <!-- Reviewed using {provider_name} ({model}) on {review_date} --> <div class="patch-review"> <h1>Patch Review: {patch_name}</h1> -<p class="review-meta">Reviewed by {provider_name} ({model}) on {review_date}{release_info}</p> -{review_text} +<p class="review-meta">Reviewed by {provider_name} ({model}) on {review_date}{release_info}<br>Patch-Id: {patch_id_label}</p> +{review_body} +<p class="review-verdict">{escape(trailer)}</p> </div> """ elif args.output_format == "markdown": @@ -1238,8 +1521,11 @@ def main() -> None: output_text = f"""# Patch Review: {patch_name} *Reviewed by {provider_name} ({model}) on {review_date}* +*Patch-Id: {patch_id_label}* {release_info} -{review_text} +{review_body} + +{trailer} """ else: # text release_info = "" @@ -1248,8 +1534,10 @@ def main() -> None: release_info = f"Target release: {args.release}{lts_badge}\n" output_text = f"=== Patch Review: {patch_name} (via {provider_name}) ===\n" output_text += f"Review date: {review_date}\n" + output_text += f"Patch-Id: {patch_id_label}\n" output_text += release_info - output_text += "\n" + review_text + output_text += "\n" + review_body + output_text += "\n\n" + trailer + "\n" # Write output if args.output: @@ -1293,12 +1581,15 @@ def main() -> None: email_body = f"""AI-generated review of {patch_name} Reviewed using {provider_name} ({model}) on {review_date} +Patch-Id: {patch_id_label} {release_info} This is an automated review. Please verify all suggestions. --- -{review_text} +{render_sections(sections, "text")} + +{trailer} """ if args.verbose: @@ -1322,8 +1613,8 @@ def main() -> None: print("", file=sys.stderr) print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) - # Exit with code based on review severity - sys.exit(classify_review(review_text, args.output_format)) + # Exit with code based on verified review severity + sys.exit(classify_review(counts)) if __name__ == "__main__": -- 2.53.0

