Ramachandran Krishnan created ATLAS-5337:
--------------------------------------------

             Summary: Fix Trino Extractor Jersey client failures for 
AtlasEntityWithExtInfo POST/GET)
                 Key: ATLAS-5337
                 URL: https://issues.apache.org/jira/browse/ATLAS-5337
             Project: Atlas
          Issue Type: Task
            Reporter: Ramachandran Krishnan
            Assignee: Ramachandran Krishnan
             Fix For: 3.0.0


h4. Summary

The Trino Extractor standalone tarball fails to import {{trino_*}} metadata 
into Atlas because of Jersey serialization errors on entity POST/GET and a 
conflicting {{jersey-client}} version in {{lib/}}. This bug has existed since 
*ATLAS-5021* (PR #428, Sep 2025) and is *not* caused by the Kafka 3.9.1 upgrade.

h4. Background

ATLAS-5021 introduced the Trino → Atlas metadata pull path:

* {{addons/models/6000-Trino/}} — {{trino_instance}}, {{trino_catalog}}, 
{{trino_schema}}, {{trino_table}}, {{trino_column}}
* {{addons/trino-extractor/}} — JDBC pull via {{AtlasClientHelper}} → 
{{AtlasClientV2}}
* Distro tarball {{apache-atlas-*-trino-extractor.tar.gz}} with 
{{run-trino-extractor.sh}}

The feature shipped with three problems that made the standalone tarball path 
broken from day one:

# *Jersey version conflict* — {{addons/trino-extractor/pom.xml}} pinned 
{{jersey-client}} *1.9* while parent POM / {{atlas-client-v2}} use *1.19* → 
both jars land in distro {{lib/}}
# *Fragile entity serialization* — {{AtlasClientV2.createEntity()}} passed a 
Java object to Jersey; works on full server / curated bridge classpaths via 
POJO mapping, fails in minimal extractor tarball ({{MessageBodyWriter}} not 
found for {{AtlasEntity$AtlasEntityWithExtInfo}})
# *No live integration test* — {{TrinoExtractorIT.java}} is a placeholder with 
no {{@Test}} methods; tarball path never exercised in CI

The bug stayed hidden because most deployments use Hive hook or manual REST 
import. It surfaced in the Trino/Ranger docker E2E lab (Jul 2026) when 
{{run-trino-extractor.sh}} was run from the standalone tarball.

*Not a Kafka 3.9.1 regression:* Dependabot commit {{6709f6459}} only changed 
{{<kafka.version>}} in root {{pom.xml}}. Trino extractor has no Kafka client 
dependency.

h4. Symptoms

*Failure 1 — POST {{createEntity}} (MessageBodyWriter):*
{code}
com.sun.jersey.api.client.ClientHandlerException:
  A message body writer for Java type, class
    org.apache.atlas.model.instance.AtlasEntity$AtlasEntityWithExtInfo,
  and MIME media type application/json; charset=UTF-8, was not found
{code}

*Failure 2 — GET {{getEntityByAttribute}} (MessageBodyReader):*
{code}
ClientHandlerException:
  A message body reader for Java class
    org.apache.atlas.model.instance.AtlasEntity$AtlasEntityWithExtInfo,
  and MIME media type application/json; charset=utf-8 was not found
{code}

*Failure 3 — Non-interactive auth (separate):* {{AuthenticationUtil}} only 
reads from {{System.console()}} → {{401 Unauthorized}} in CI/E2E scripts.

Workaround before fix: manual Atlas REST import (curl) or 
{{TRINO_METADATA_MODE=rest}} in E2E script.

h4. Root Cause

{code}
apache-atlas-*-trino-extractor.tar.gz lib/
  jersey-client-1.9.jar   ← pinned in trino-extractor/pom.xml
  jersey-client-1.19.jar  ← transitive from atlas-client-v2
  → mixed JSON providers → POST writer / GET reader failures
{code}

Entity APIs relied on Jersey POJO mapping; type-def APIs already used 
{{AtlasType.toJson()}} explicitly — that asymmetry is why type defs could work 
while entity POSTs failed.

h4. Proposed Fix

|| # || File || Change ||
| 1 | {{addons/trino-extractor/pom.xml}} | Remove explicit {{jersey-client}} 
1.9 pin; inherit {{jersey.version}} 1.19 from parent |
| 2 | {{client/client-v2/.../AtlasClientV2.java}} | Entity mutation APIs send 
JSON via {{AtlasType.toJson()}} — {{createEntity}}, {{createEntities}}, 
{{updateEntity}}, {{updateEntities}}, {{updateEntityByAttribute}} |
| 3 | {{client/common/.../AtlasBaseClient.java}} | For 
{{org.apache.atlas.model.*}} response types, read body as String and parse with 
{{AtlasJson.fromJson()}} |
| 4 | {{intg/.../AuthenticationUtil.java}} | Support {{ATLAS_USERNAME}} / 
{{ATLAS_PASSWORD}} env vars before console prompt |

*Note on fix #2:* Existing bridges and webapp ITs used the object-passing 
pattern successfully where Jersey POJO mapping had a clean classpath. 
Pre-serializing with {{AtlasType.toJson()}} produces the same wire JSON and 
makes entity APIs consistent with type-def APIs — no behaviour change for 
working callers.

h4. Acceptance Criteria

* Trino extractor tarball contains only {{jersey-client-1.19.jar}} (no 1.9).
* {{run-trino-extractor.sh}} imports {{trino_*}} entities without Jersey 
{{MessageBodyWriter}} / {{MessageBodyReader}} errors.
* {{getEntityByAttribute}} / {{getEntityByGuid}} work from standalone extractor 
{{lib/}} layout.
* Non-interactive runs succeed with {{ATLAS_USERNAME}} / {{ATLAS_PASSWORD}} env 
vars.
* Existing bridge / webapp client behaviour unchanged (same JSON on the wire).
* Full Trino → Atlas → TagSync → Ranger tag-auth E2E passes with extractor (not 
REST fallback).




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to