Ma77Ball commented on code in PR #5359:
URL: https://github.com/apache/texera/pull/5359#discussion_r3346755461


##########
.github/workflows/comment-commands.yml:
##########
@@ -111,10 +113,16 @@ jobs:
           || startsWith(github.event.comment.body, '/unrequest-review'))
     runs-on: ubuntu-latest
     steps:
+      - uses: actions/checkout@v5

Review Comment:
   `actions/checkout@v5` on an `issue_comment` event checks out the repo's 
**default branch**, not this PR's base ref. With `fetch-depth: 0` that's fine 
when the PR targets `main` (base.sha is an ancestor), but for a PR targeting a 
release/divergent branch, `pull.base.sha` may not be reachable from the 
checked-out tree — `git blame <base.sha>` would then fail for *every* file and 
the command silently produces no reviewers. Consider fetching/checking out 
`pull.base.ref` (or `base.sha`) explicitly.



##########
.github/workflows/comment-commands.yml:
##########
@@ -137,6 +145,97 @@ jobs:
               return;
             }
 
+            // Find a reviewer for the PR by running `git blame` on each
+            // changed file, picking the most recent commit author for
+            // that file, and turning that commit into a GitHub login.
+            // Returns an array of unique logins.
+            async function getReviewersFromBlame() {
+              const { data: pull } = await github.rest.pulls.get({
+                owner, repo, pull_number,
+              });
+
+              // List files changed in the pull request (paginated).
+              const files = await github.paginate(github.rest.pulls.listFiles, 
{
+                owner, repo, pull_number, per_page: 100,
+              });
+
+              // Parse the `git blame -p` output to find, for each file,
+              // the commit (SHA) that was the most recent author-time.
+              function latestBlameCommit(blameOutput) {
+                let latest = null; // { sha, authorTime }
+                let current = null;
+
+                function finalizeCurrent() {
+                  if (!current || current.authorTime == null) return;
+                  if (!latest || current.authorTime > latest.authorTime) {
+                    latest = current;
+                  }
+                }
+
+                for (const line of blameOutput.split(/\r?\n/)) {
+                  // A header line marks a new blamed commit block.
+                  const header = 
line.match(/^([0-9a-f^]+)\s+\d+\s+\d+\s+\d+$/);
+                  if (header) {
+                    finalizeCurrent();
+                    current = { sha: header[1].replace(/^\^/, ''), authorTime: 
null };
+                    continue;
+                  }
+
+                  // `author-time` lines give a UNIX timestamp we can use.
+                  const authorTime = line.match(/^author-time\s+(\d+)$/);
+                  if (authorTime && current) {
+                    current.authorTime = Number(authorTime[1]);
+                  }
+                }
+
+                finalizeCurrent();
+                return latest;
+              }
+
+              const reviewers = new Set();
+              for (const { filename } of files) {
+                let blameOutput;
+                try {
+                  // Run blame at the PR base commit so we attribute
+                  // existing lines correctly (not the PR tip).
+                  blameOutput = execFileSync('git', ['blame', '-p', 
pull.base.sha, '--', filename], { encoding: 'utf8' });
+                } catch (e) {
+                  core.warning(`git blame on ${filename} at ${pull.base.sha} 
failed: ${e.message}`);
+                  continue;
+                }
+
+                const latest = latestBlameCommit(blameOutput);
+                if (!latest) {
+                  core.warning(`Could not determine a blamed commit for 
${filename}; skipping.`);
+                  continue;
+                }
+
+                let commit;
+                try {
+                  ({ data: commit } = await github.rest.repos.getCommit({ 
owner, repo, ref: latest.sha }));
+                } catch (e) {
+                  core.warning(`Commit lookup for ${latest.sha} from 
${filename} failed: ${e.message}`);
+                  continue;
+                }
+
+                // Prefer the GitHub-linked author/committer; skip if
+                // there is no linked login (e.g., authored by email only).
+                const login = commit.author?.login ?? commit.committer?.login;
+                if (!login) {
+                  core.warning(`Commit ${latest.sha} from ${filename} has no 
GitHub user; skipping.`);
+                  continue;
+                }
+
+                // Don't request the PR author or bot accounts.
+                if (login.toLowerCase() === author.toLowerCase()) continue;
+                if (commit.author?.type === 'Bot' || commit.committer?.type 
=== 'Bot') continue;
+
+                reviewers.add(login);
+              }
+
+              return [...reviewers].sort();

Review Comment:
   This returns the latest author of *every* changed file, so a wide PR can 
request many reviewers. Two issues: (1) `requestReviewers` rejects >15 
reviewers with a 422, and (2) it's noisy. Limit to like 2 reviewers.



##########
.github/workflows/comment-commands.yml:
##########
@@ -111,10 +113,16 @@ jobs:
           || startsWith(github.event.comment.body, '/unrequest-review'))
     runs-on: ubuntu-latest
     steps:
+      - uses: actions/checkout@v5
+        with:
+          fetch-depth: 0
       - uses: actions/github-script@v8
         with:
           github-token: ${{ secrets.GITHUB_TOKEN }}
           script: |
+            // `git blame` needs to runlocally in the runner. Use

Review Comment:
   ```suggestion
               // `git blame` needs to run locally in the runner. Use
   ```
   Nit: typo — "runlocally" should be "run locally".
   



##########
.github/workflows/comment-commands.yml:
##########
@@ -153,8 +252,18 @@ jobs:
                 reviewers.push(h);
             }
             if (!reviewers.length && !team_reviewers.length) {
-              core.warning(`No valid @mentions in '${action}'; skipping.`);
-              return;
+              if (action !== 'request-review') {
+                core.warning(`No valid @mentions in '${action}'; skipping.`);
+                return;
+              }
+
+              reviewers.push(...await getReviewersFromBlame());

Review Comment:
   **Main concern:** `requestReviewers` only accepts **collaborators**; any 
other login returns `422. Reviews may only be requested from collaborators."` 
In a public repo, `git blame` frequently surfaces past contributors who aren't 
current collaborators. Since the REST call is atomic, a single failed login 
fails the *entire* request, and zero reviewers get added. The path mentioned 
above already guards against the analogous "one bad name" case; blame logins 
enter here, unvalidated. Consider validating each login (e.g., 
`repos.checkCollaborator`) before adding it.



##########
.github/workflows/comment-commands.yml:
##########
@@ -137,6 +145,97 @@ jobs:
               return;
             }
 
+            // Find a reviewer for the PR by running `git blame` on each
+            // changed file, picking the most recent commit author for
+            // that file, and turning that commit into a GitHub login.
+            // Returns an array of unique logins.
+            async function getReviewersFromBlame() {
+              const { data: pull } = await github.rest.pulls.get({
+                owner, repo, pull_number,
+              });
+
+              // List files changed in the pull request (paginated).
+              const files = await github.paginate(github.rest.pulls.listFiles, 
{
+                owner, repo, pull_number, per_page: 100,
+              });
+
+              // Parse the `git blame -p` output to find, for each file,
+              // the commit (SHA) that was the most recent author-time.
+              function latestBlameCommit(blameOutput) {
+                let latest = null; // { sha, authorTime }
+                let current = null;
+
+                function finalizeCurrent() {
+                  if (!current || current.authorTime == null) return;
+                  if (!latest || current.authorTime > latest.authorTime) {
+                    latest = current;
+                  }
+                }
+
+                for (const line of blameOutput.split(/\r?\n/)) {
+                  // A header line marks a new blamed commit block.
+                  const header = 
line.match(/^([0-9a-f^]+)\s+\d+\s+\d+\s+\d+$/);
+                  if (header) {
+                    finalizeCurrent();
+                    current = { sha: header[1].replace(/^\^/, ''), authorTime: 
null };
+                    continue;
+                  }
+
+                  // `author-time` lines give a UNIX timestamp we can use.
+                  const authorTime = line.match(/^author-time\s+(\d+)$/);
+                  if (authorTime && current) {
+                    current.authorTime = Number(authorTime[1]);
+                  }
+                }
+
+                finalizeCurrent();
+                return latest;
+              }
+
+              const reviewers = new Set();
+              for (const { filename } of files) {
+                let blameOutput;
+                try {
+                  // Run blame at the PR base commit so we attribute
+                  // existing lines correctly (not the PR tip).
+                  blameOutput = execFileSync('git', ['blame', '-p', 
pull.base.sha, '--', filename], { encoding: 'utf8' });
+                } catch (e) {
+                  core.warning(`git blame on ${filename} at ${pull.base.sha} 
failed: ${e.message}`);
+                  continue;
+                }
+
+                const latest = latestBlameCommit(blameOutput);
+                if (!latest) {
+                  core.warning(`Could not determine a blamed commit for 
${filename}; skipping.`);
+                  continue;
+                }
+
+                let commit;
+                try {
+                  ({ data: commit } = await github.rest.repos.getCommit({ 
owner, repo, ref: latest.sha }));
+                } catch (e) {
+                  core.warning(`Commit lookup for ${latest.sha} from 
${filename} failed: ${e.message}`);
+                  continue;
+                }
+
+                // Prefer the GitHub-linked author/committer; skip if
+                // there is no linked login (e.g., authored by email only).
+                const login = commit.author?.login ?? commit.committer?.login;
+                if (!login) {
+                  core.warning(`Commit ${latest.sha} from ${filename} has no 
GitHub user; skipping.`);
+                  continue;
+                }
+
+                // Don't request the PR author or bot accounts.
+                if (login.toLowerCase() === author.toLowerCase()) continue;
+                if (commit.author?.type === 'Bot' || commit.committer?.type 
=== 'Bot') continue;

Review Comment:
   `login` is taken from `commit.author?.login ?? commit.committer?.login`, but 
this rejects the commit if *either* author or committer is a Bot. A 
human-authored commit applied by a bot committer (e.g. a patch-applying bot) 
would be wrongly excluded. Consider gating on the bot-ness of whichever 
identity actually supplied `login`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to