[ 
https://issues.apache.org/jira/browse/RANGER-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088443#comment-18088443
 ] 

Ramachandran Krishnan commented on RANGER-5637:
-----------------------------------------------

Merged in 
master:https://github.com/apache/ranger/commit/137a5dd4a8bb67db9c07e2fbd117864d282da414

> Ranger CI: fix plugins-docker-build (download timeouts, Knox/Ozone smoke-test 
> failures)
> ---------------------------------------------------------------------------------------
>
>                 Key: RANGER-5637
>                 URL: https://issues.apache.org/jira/browse/RANGER-5637
>             Project: Ranger
>          Issue Type: Task
>          Components: Ranger
>            Reporter: Ramachandran Krishnan
>            Assignee: Ramachandran Krishnan
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: image-2026-06-10-10-26-27-904.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> h2. 1. ozone-om — Java 17 mismatch (primary)
> Evidence from CI:
> UnsupportedClassVersionError: XmlConfigChanger ... class file version 61.0 
> ... only recognizes up to 55.0
> UnsupportedClassVersionError: RangerOzoneAuthorizer ... class file version 
> 61.0
> STARTUP_MSG: java = 11.0.19
>  
> Ranger 3.0 is built with Java 17. The Ozone stack uses 
> {{{}apache/ozone-runner:20230615-1{}}}, which ships Java 11.
> The setup script hardcodes Java 11 for plugin enable:
> ranger-ozone-setup.shLines 29-30
>  
>  
> {code:java}
>  echo"export JAVA_HOME=${JAVA_HOME}">>conf/ozone-env.sh   
> sudoJAVA_HOME=/usr/lib/jvm/jre/./enable-ozone-plugin.sh   {code}
>  
> Even if XmlConfigChanger were fixed, OM would still fail loading 
> {{RangerOzoneAuthorizer}} at runtime on Java 11.
>  
> Changes:
>  # {{dev-support/ranger-docker/.env}} — set 
> {{OZONE_RUNNER_VERSION=20241022-jdk17-1}}
>  # {{dev-support/ranger-docker/scripts/ozone/ranger-ozone-setup.sh}} — use 
> {{${JAVA_HOME}}} instead of {{/usr/lib/jvm/jre/}}
> Risk: Ozone 1.4.x officially targets JDK 11 for some CLI paths 
> ([HDDS-12153|https://github.com/apache/ozone-docker/pull/39]), but services 
> run fine on JDK 17, which is what CI needs.
> h2. 2. ozone-om — SCM startup ordering (secondary)
> Evidence: {{{}Connection refused: scm:9863{}}}, then 
> {{ServerNotLeaderException}} during OM {{{}--init{}}}. OM init did succeed 
> once; SCM caught up shortly after.
> In {{{}docker-compose.ranger-ozone.yml{}}}, {{om}} has no dependency on 
> {{scm}} or {{{}datanode{}}}:
> {code:java}
> docker-compose.ranger-ozone.ymlLines 37-50        
> depends_on:   ranger:   condition:service_started   
> ranger-solr:   condition:service_started       {code}
>  
> ...
> command:bash -c "/opt/hadoop/ranger-ozone-plugin/ranger-ozone-setup.sh && 
> /opt/hadoop/bin/ozone om"
>  
> All three (scm, datanode, om) start in parallel.
> h3. Fix
> om:
> depends_on:
>  
> {code:java}
>  scm:   condition:service_started   datanode:   
> condition:service_started   
> ranger:   condition:service_started   
> ranger-solr:   condition:service_started {code}
>  
> Optionally add a short {{wait-for-scm.sh}} (poll {{{}scm:9860{}}}) before 
> {{ozone om}} for extra stability. This is a flake reducer, not the root cause 
> — Java mismatch is what actually killed the container.
> h2. 3. ranger-knox — Gateway failed to start
> Evidence from CI: Plugin enable completed successfully (audit XML, topology 
> updates, cred.jceks). LDAP started. Then:
> Starting Gateway failed.
> The Knox Gateway process probably exited, no process id found!
> So this is not XmlConfigChanger or plugin-enable failure. The gateway JVM 
> exits immediately; gateway.log is not printed in CI, which makes diagnosis 
> harder.
> h3. Most likely cause: incomplete Knox plugin packaging after RANGER-5632
> [RANGER-5632|https://github.com/apache/ranger/pull/999] removed Solr/HDFS 
> audit destinations from plugin tarballs, leaving auditserver only. Docker 
> install props enable auditserver:
> ranger-knox-plugin-install.propertiesLines 35-37
> XAAUDIT.AUDITSERVER.ENABLE=true
> XAAUDIT.AUDITSERVER.URL=http://ranger-audit-ingestor.rangernw:7081
> {{ranger-audit-dest-auditserver}} depends on Jersey 2 + HK2 (which needs 
> {{{}javax.inject{}}}). Compare assembly whitelists:
> ||Dependency||{{plugin-kafka.xml}} (passes CI)||{{knox-agent.xml}} (fails)||
> |{{jersey-media-json-jackson}}|✅|❌|
> |{{jersey-entity-filtering}}|✅|❌|
> |{{jackson-jaxrs-json-provider}}|✅|❌|
> |{{javax.inject}}|❌|❌|
> |{{httpasyncclient}} / {{httpcore-nio}}|✅|✅|
> Knox’s whitelist is much thinner than Kafka’s. Before RANGER-5632, Knox 
> audits went through Solr ({{{}solrj{}}} is in the Knox whitelist). After 
> switching to auditserver-only, the Jersey client stack may be missing from 
> {{{}lib/ranger-knox-plugin-impl/{}}}, causing gateway classpath failure at 
> startup (same class of bug as PDP’s {{javax.inject.Singleton}} issue).
> h3. Fixes for Knox
> A. Packaging (likely root fix) — align 
> {{distro/src/main/assembly/knox-agent.xml}} with Kafka/Ozone:
>  * Add {{javax.inject:javax.inject}}
>  * Add {{org.glassfish.jersey.core:jersey-client}}
>  * Add {{org.glassfish.jersey.inject:jersey-hk2}}
>  * Add {{org.glassfish.jersey.media:jersey-media-json-jackson}}
>  * Add {{com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider}}
>  * Add HK2 deps if needed (as in {{{}pdp.xml{}}})
> B. Diagnostics — improve {{ranger-knox.sh}} so CI captures the real error:
> if [ -z "$KNOX_GATEWAY_PID" ]; then
> echo"Gateway logs:"
> tail-100"${KNOX_HOME}/logs/gateway.log"2>/dev/null||true
> tail-100"${KNOX_HOME}/logs/gateway-${HOSTNAME}.log"2>/dev/null||true
> fi
> C. Compose ordering (optional) — Knox {{depends_on}} only {{ranger}} + 
> {{{}ranger-zk{}}}, but sandbox topology references {{{}ranger-hadoop{}}}, 
> {{{}ranger-hive{}}}, {{{}ranger-hbase{}}}. Adding {{ranger-hadoop: 
> service_healthy}} may help stability; it is unlikely to be the immediate 
> gateway crash cause.
> D. Audit ingestor not in {{plugins-docker-build}} — {{ranger-audit-ingestor}} 
> is not started in that CI job. That should not block gateway startup (audits 
> are async), but audits will not flow until ingestor is added to the compose 
> stack or auditserver is disabled for the smoke test only.
> h2. Recommended fix order
> !image-2026-06-10-10-26-27-904.png! # Ozone Java 17 runner — highest 
> confidence, small diff
>  # Knox assembly deps — likely fixes gateway; mirrors PDP/Kafka pattern
>  # Ozone compose ordering — reduces SCM flakes
>  # Knox log dump in CI — confirms root cause if gateway still fails
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to