ramackri opened a new pull request, #1030: URL: https://github.com/apache/ranger/pull/1030
Fixes [RANGER-5654](https://issues.apache.org/jira/browse/RANGER-5654): Solr audit dispatcher stops indexing audits into Kerberos-protected Solr after a TGT refresh/relogin when `useTicketCache=true` (the shipped default). ### What changes were proposed in this pull request? **Problem** The Solr audit dispatcher consumes audits from Kafka but eventually stops writing to Solr when Kerberos is enabled. Dispatcher logs show repeated failures such as `Failure in sending audits into Solr` and `No key to store`. Kafka consumer offsets continue to advance while Solr document counts remain flat. Root cause: 1. Shipped Solr dispatcher config sets `xasecure.audit.jaas.Client.option.useTicketCache=true` together with keytab-based login. 2. `AbstractKerberosUser.checkTGTAndRelogin()` performs `logout(); login()` when the TGT nears expiry. 3. With `useTicketCache=true`, the relogin path can fail because the ticket cache has no key material to store after logout, leaving the dispatcher in a broken auth state until restart. **Solution** | Area | File | Change | |------|------|--------| | Relogin recovery | `agents-audit/core/.../AbstractKerberosUser.java` | On relogin `LoginException`, recreate `Subject` and `LoginContext`, then retry `login()` instead of leaving the user logged out | | Shipped config | `audit-server/audit-dispatcher/dispatcher-solr/.../ranger-audit-dispatcher-solr-site.xml` | Set `useTicketCache=false` so keytab login is used consistently (avoids ticket-cache relogin failure) | | Docker config | `dev-support/ranger-docker/scripts/audit-dispatcher/ranger-audit-dispatcher-solr-site.xml` | Same `useTicketCache=false` default for Tier 3 audit stack | This complements [RANGER-5643](https://issues.apache.org/jira/browse/RANGER-5643) (JAAS `_HOST` expansion and Solr URL rewrite for SPNEGO). RANGER-5643 fixed initial SPNEGO/JAAS principal alignment; this patch fixes the **long-running** dispatcher failure after TGT relogin. ### How was this patch tested? #### Code review / static verification - Confirmed `checkTGTAndRelogin()` is invoked from `KerberosAction` before Solr operations, so relogin failures directly block indexing. - Verified shipped and Docker Solr dispatcher site XML both defaulted to `useTicketCache=true` on master. #### Manual testing (Docker Tier 3 audit stack with Kerberos) Environment: `dev-support/ranger-docker` Tier 3 compose (`docker-compose.ranger-audit-tier3.yml`) — Ranger Admin, KDC, Postgres, Solr, ZooKeeper, Kafka, audit ingestor, Solr audit dispatcher, and HDFS plugin with Kerberos enabled. **Reproduce failure (master behavior, before patch):** 1. Start Tier 3 stack and wait for audit health (`./scripts/audit/wait-for-audit-health.sh --tier 3`). 2. Trigger HDFS audits (e.g. `hdfs dfs -ls /` as a test user). 3. Confirm audits reach Kafka (ingestor offset / topic growth). 4. After TGT refresh window or forced relogin cycle, observe Solr dispatcher logs: - `Failure in sending audits into Solr` - `No key to store` 5. Solr query count for test user (`reqUser:testuser1`) stops increasing; Kafka offset continues to grow. **Verify fix (with this patch applied):** 1. Rebuild audit-dispatcher tarball with patched `agents-audit/core` and redeploy Solr dispatcher container with updated site XML (`useTicketCache=false`). 2. Restart Solr dispatcher and confirm clean JAAS login in logs (`Successful login for rangerauditserver/...`). 3. Trigger additional HDFS audits. 4. Confirm Solr document count increases (e.g. `reqUser:testuser1` count incremented). 5. Confirm Ranger Admin audit UI / Solr `totalCount` reflects new audits. 6. Full HDFS → ingestor → Kafka → Solr dispatcher → Solr pipeline **PASS** (12/12 checks in dynamic-partition E2E harness; Solr indexing hop green after dispatcher restart with patched config). **Observed after fix:** - No `No key to store` errors during normal operation or after relogin. - Solr dispatcher resumes indexing without manual keytab re-kinit inside the container. - End-to-end audit delivery to Solr stable under Kerberos. ### Related - Jira: [RANGER-5654](https://issues.apache.org/jira/browse/RANGER-5654) - Related Kerberos SPNEGO fix: [RANGER-5643](https://issues.apache.org/jira/browse/RANGER-5643) Made with [Cursor](https://cursor.com) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
