ramackri opened a new pull request, #1035:
URL: https://github.com/apache/ranger/pull/1035

   ## What changes were proposed in this pull request?
   
   Fixes [RANGER-5656](https://issues.apache.org/jira/browse/RANGER-5656): 
TagSync **Atlas REST** tag source fails when 
`TAG_SOURCE_ATLASREST_ENABLED=true` after 
[RANGER-4076](https://issues.apache.org/jira/browse/RANGER-4076) migrated 
TagSync to **Jersey 2.x**, while `AtlasRESTTagSource` still used 
**`AtlasClientV2`** (Jersey 1.x from `atlas-client-v2`).
   
   ### Problem
   
   TagSync reads classifications from Atlas via `AtlasClientV2` and pushes tags 
to Ranger Admin via `TagAdminRESTSink` (Jersey 2). Both JAX-RS stacks load on 
the same JVM:
   
   ```
   AbstractMethodError: javax.ws.rs.core.UriBuilder.uri(...)
     at 
org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(...)
   ```
   
   The Atlas REST sync thread exits on the first poll. TagSync may appear 
running, but **no automatic Atlas → Ranger tag push** occurs.
   
   | Before RANGER-4076 | After RANGER-4076 (broken) | This PR |
   |--------------------|----------------------------|---------|
   | TagSync sink: Jersey 1 + `AtlasClientV2`: Jersey 1 | TagSync sink: Jersey 
2 + `AtlasClientV2`: Jersey 1 | Atlas REST: `HttpURLConnection`; sink: Jersey 2 
|
   
   ### Solution
   
   1. **`AtlasRESTHttpClient`** (new) — minimal Atlas REST client using 
`HttpURLConnection` + Atlas `AtlasType` JSON helpers (no Jersey, no 
`AtlasClientV2`).
      - `POST api/atlas/v2/search/basic` — classified entity search
      - `GET api/atlas/v2/types/typedefs/` — typedef load
      - Supports Basic auth and Kerberos (`UserGroupInformation.doAs`)
   
   2. **`AtlasRESTTagSource`** — call `AtlasRESTHttpClient` instead of 
`AtlasClientV2`; remove `getAtlasClient()`.
   
   3. **Packaging** — remove `atlas-client-v1` / `atlas-client-v2` from 
`tagsync/pom.xml` and from `distro/src/main/assembly/tagsync.xml` so Jersey 1 
client JARs are not shipped in the TagSync tarball.
   
   **Not changed:** `TagSynchronizer`, `TagAdminRESTSink`, Atlas Kafka tag 
source, Hive plugin.
   
   ### Files changed
   
   | File | Change |
   |------|--------|
   | `tagsync/.../AtlasRESTHttpClient.java` | **Added** |
   | `tagsync/.../AtlasRESTTagSource.java` | **Modified** — use HTTP client |
   | `tagsync/pom.xml` | **Modified** — drop `atlas-client-v1/v2` dependencies |
   | `distro/src/main/assembly/tagsync.xml` | **Modified** — drop 
`atlas-client-*` from `lib/` |
   
   ---
   
   ## How was this patch tested?
   
   ### Build and static checks
   
   ```bash
   mvn package -pl tagsync -am -DskipTests
   mvn checkstyle:check -pl tagsync -DskipTests
   ```
   
   | Check | Result |
   |-------|--------|
   | `tagsync` module compile/package | Pass |
   | Checkstyle on new `AtlasRESTHttpClient.java` | Pass (0 violations) |
   
   ### Manual testing (Atlas + Ranger + Hive integration)
   
   Manual validation used a docker environment with Ranger Admin, TagSync, 
Apache Atlas, and Hive (Kerberos-enabled), with 
`TAG_SOURCE_ATLASREST_ENABLED=true` and Atlas REST credentials configured in 
TagSync install properties.
   
   #### 1. TagSync starts without Jersey conflict
   
   - Rebuilt TagSync from this branch and deployed the new tarball.
   - Started TagSync with Atlas REST source enabled.
   - Confirmed `TagSynchronizer` process is running and **no 
`AbstractMethodError`** / `UriBuilderImpl` errors appear in TagSync logs on 
startup or first poll.
   
   #### 2. Atlas classification → Ranger tag mapping (automatic sync)
   
   - Created a Hive table with columns suitable for governance testing.
   - Applied a **PII** classification to a Hive column entity in Atlas via 
Atlas REST API.
   - Waited for TagSync poll interval (~60 seconds).
   - Verified in Ranger Admin **Tag** menu: tag definition **PII** and resource 
mapping on the Hive service for the classified column (`createdBy: 
rangertagsync`, GUID matches Atlas).
   - Verified via Ranger REST: `GET 
/service/tags/resources?serviceName=<hive-service>` and `GET 
/service/tags/tagresourcemaps?serviceName=<hive-service>`.
   
   #### 3. End-to-end tag policy enforcement in Hive
   
   With tag mappings present, configured Ranger policies on **`dev_hive`** 
(RBAC allow) and **`dev_tag`** (tag deny for one user; tag data mask 
`hive:MASK` for another on tag **PII**). Loaded sample row data into the Hive 
table.
   
   | User | Query | Expected | Result |
   |------|-------|----------|--------|
   | User A | `SELECT` non-PII column | Allowed | Pass |
   | User A | `SELECT` PII-tagged column | Denied 
(`HiveAccessControlException`) | Pass |
   | User B | `SELECT` PII-tagged column | Masked values (e.g. `nnn-nn-nnnn`) | 
Pass |
   | Hive admin | `SELECT` PII-tagged column | Raw values | Pass |
   
   #### 4. Regression checks
   
   - TagSync still pushes tags to Ranger Admin via `TagAdminRESTSink` (Jersey 2 
unchanged).
   - No change to file-based or Kafka-based tag sources in this PR.
   
   ---
   
   ## Related issues
   
   - **RANGER-4076** — Jersey 1 → 2 migration (regression source)
   - **RANGER-1897** — original `AtlasClientV2` adoption in `AtlasRESTTagSource`
   
   Made with [Cursor](https://cursor.com)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to