Ramachandran Krishnan created ATLAS-5337:
--------------------------------------------
Summary: Fix Trino Extractor Jersey client failures for
AtlasEntityWithExtInfo POST/GET)
Key: ATLAS-5337
URL: https://issues.apache.org/jira/browse/ATLAS-5337
Project: Atlas
Issue Type: Task
Reporter: Ramachandran Krishnan
Assignee: Ramachandran Krishnan
Fix For: 3.0.0
h4. Summary
The Trino Extractor standalone tarball fails to import {{trino_*}} metadata
into Atlas because of Jersey serialization errors on entity POST/GET and a
conflicting {{jersey-client}} version in {{lib/}}. This bug has existed since
*ATLAS-5021* (PR #428, Sep 2025) and is *not* caused by the Kafka 3.9.1 upgrade.
h4. Background
ATLAS-5021 introduced the Trino → Atlas metadata pull path:
* {{addons/models/6000-Trino/}} — {{trino_instance}}, {{trino_catalog}},
{{trino_schema}}, {{trino_table}}, {{trino_column}}
* {{addons/trino-extractor/}} — JDBC pull via {{AtlasClientHelper}} →
{{AtlasClientV2}}
* Distro tarball {{apache-atlas-*-trino-extractor.tar.gz}} with
{{run-trino-extractor.sh}}
The feature shipped with three problems that made the standalone tarball path
broken from day one:
# *Jersey version conflict* — {{addons/trino-extractor/pom.xml}} pinned
{{jersey-client}} *1.9* while parent POM / {{atlas-client-v2}} use *1.19* →
both jars land in distro {{lib/}}
# *Fragile entity serialization* — {{AtlasClientV2.createEntity()}} passed a
Java object to Jersey; works on full server / curated bridge classpaths via
POJO mapping, fails in minimal extractor tarball ({{MessageBodyWriter}} not
found for {{AtlasEntity$AtlasEntityWithExtInfo}})
# *No live integration test* — {{TrinoExtractorIT.java}} is a placeholder with
no {{@Test}} methods; tarball path never exercised in CI
The bug stayed hidden because most deployments use Hive hook or manual REST
import. It surfaced in the Trino/Ranger docker E2E lab (Jul 2026) when
{{run-trino-extractor.sh}} was run from the standalone tarball.
*Not a Kafka 3.9.1 regression:* Dependabot commit {{6709f6459}} only changed
{{<kafka.version>}} in root {{pom.xml}}. Trino extractor has no Kafka client
dependency.
h4. Symptoms
*Failure 1 — POST {{createEntity}} (MessageBodyWriter):*
{code}
com.sun.jersey.api.client.ClientHandlerException:
A message body writer for Java type, class
org.apache.atlas.model.instance.AtlasEntity$AtlasEntityWithExtInfo,
and MIME media type application/json; charset=UTF-8, was not found
{code}
*Failure 2 — GET {{getEntityByAttribute}} (MessageBodyReader):*
{code}
ClientHandlerException:
A message body reader for Java class
org.apache.atlas.model.instance.AtlasEntity$AtlasEntityWithExtInfo,
and MIME media type application/json; charset=utf-8 was not found
{code}
*Failure 3 — Non-interactive auth (separate):* {{AuthenticationUtil}} only
reads from {{System.console()}} → {{401 Unauthorized}} in CI/E2E scripts.
Workaround before fix: manual Atlas REST import (curl) or
{{TRINO_METADATA_MODE=rest}} in E2E script.
h4. Root Cause
{code}
apache-atlas-*-trino-extractor.tar.gz lib/
jersey-client-1.9.jar ← pinned in trino-extractor/pom.xml
jersey-client-1.19.jar ← transitive from atlas-client-v2
→ mixed JSON providers → POST writer / GET reader failures
{code}
Entity APIs relied on Jersey POJO mapping; type-def APIs already used
{{AtlasType.toJson()}} explicitly — that asymmetry is why type defs could work
while entity POSTs failed.
h4. Proposed Fix
|| # || File || Change ||
| 1 | {{addons/trino-extractor/pom.xml}} | Remove explicit {{jersey-client}}
1.9 pin; inherit {{jersey.version}} 1.19 from parent |
| 2 | {{client/client-v2/.../AtlasClientV2.java}} | Entity mutation APIs send
JSON via {{AtlasType.toJson()}} — {{createEntity}}, {{createEntities}},
{{updateEntity}}, {{updateEntities}}, {{updateEntityByAttribute}} |
| 3 | {{client/common/.../AtlasBaseClient.java}} | For
{{org.apache.atlas.model.*}} response types, read body as String and parse with
{{AtlasJson.fromJson()}} |
| 4 | {{intg/.../AuthenticationUtil.java}} | Support {{ATLAS_USERNAME}} /
{{ATLAS_PASSWORD}} env vars before console prompt |
*Note on fix #2:* Existing bridges and webapp ITs used the object-passing
pattern successfully where Jersey POJO mapping had a clean classpath.
Pre-serializing with {{AtlasType.toJson()}} produces the same wire JSON and
makes entity APIs consistent with type-def APIs — no behaviour change for
working callers.
h4. Acceptance Criteria
* Trino extractor tarball contains only {{jersey-client-1.19.jar}} (no 1.9).
* {{run-trino-extractor.sh}} imports {{trino_*}} entities without Jersey
{{MessageBodyWriter}} / {{MessageBodyReader}} errors.
* {{getEntityByAttribute}} / {{getEntityByGuid}} work from standalone extractor
{{lib/}} layout.
* Non-interactive runs succeed with {{ATLAS_USERNAME}} / {{ATLAS_PASSWORD}} env
vars.
* Existing bridge / webapp client behaviour unchanged (same JSON on the wire).
* Full Trino → Atlas → TagSync → Ranger tag-auth E2E passes with extractor (not
REST fallback).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)