[
https://issues.apache.org/jira/browse/CASSANDRA-21156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065771#comment-18065771
]
Dipankar Achinta commented on CASSANDRA-21156:
----------------------------------------------
Did some investigation on the reported behavior, seems like a class
initialization ordering issue.
---
Below are my observations:
* During {*}_DatabaseDescriptor.loadConfig()_{*}, using the deprecated key
*_table_count_warn_threshold_* in `{_}cassandra.yaml`{_} triggers a config
converter ({_}*TABLE_COUNT_THRESHOLD_TO_GUARDRAIL*{_}).
* That converter calls
{_}*SchemaConstants.getLocalAndReplicatedSystemTableNames()*{_}, which accesses
*_SystemKeyspace.TABLE_NAMES_* — a non-compile-time-constant with shape,
`{+}_static final ImmutableSet_{+}`.
* This access triggers _*SystemKeyspace's*_ static initializer
({_}*<clinit>*{_}) before _*DatabaseDescriptor*_ has finished loading,
producing a partially initialized state and a downstream
_*NullPointerException* or_ {_}*ExceptionInInitializerError*{_}.
+*According to
[{+}JLS{+}|{+}https://docs.oracle.com/javase/specs/jls/se7/html/jls-12.html{+}]:*+
* +_Section__12.4.1_+ — A class is initialized on first active use. Reading a
`static final` field that is a _*compile-time constant*_ ({_}primitives{_} and
`{_}String{_}` literals) is +*not*+ an active use and does +*not*+ trigger
`{_}*<clinit>*{_}`. Reading any other `{_}*static final*{_}` field (e.g.
`{_}ImmutableSet{_}`) is an active use and *will* trigger `{_}*<clinit>*{_}`.
* +_Section__12.4.2 :: Step 3_+ — If the class is currently being initialized
by the same thread (a recursive request), the JVM returns the partially
initialized class immediately. Static fields not yet assigned at that point
read as their default value (`null` for references).
+*Code Path:*+
DatabaseDescriptor.<clinit> *(loadConfig running)*
└─→ Converters.TABLE_COUNT_THRESHOLD_TO_GUARDRAIL
└─→ SchemaConstants.getLocalAndReplicatedSystemTableNames()
└─→ SystemKeyspace.TABLE_NAMES *← non-constant
field access*
└─→ SystemKeyspace.<clinit> *←
triggered too early*
└─→
DatabaseDescriptor.getPartitioner()
└─→ partitioner == null *← partially initialized*
└─→ +NPE+ / +ExceptionInInitializerError+
{+}*Reproducer*{+}{*}:{*} Sample program to test/confirm the
partial-initialization problem.
{code:java}
public class EarlyInitDemo {
static class DatabaseDescriptor {
static String partitioner = null;
static {
String names = SchemaConstants.getSystemTableNames(); // triggers
the chain
partitioner = "Murmur3Partitioner"; // set AFTER
YAML convert/load
}
static String getPartitioner() { return partitioner; }
}
static class SchemaConstants {
static String getSystemTableNames() {
return SystemKeyspace.TABLE_NAMES.toString(); // non-constant:
triggers <clinit>
}
}
static class SystemKeyspace {
static final java.util.Set<String> TABLE_NAMES;
static {
String p = DatabaseDescriptor.getPartitioner(); // same thread:
gets null
if (p == null) throw new NullPointerException("DD not initialized
yet");
TABLE_NAMES = java.util.Set.of("local", "peers");
}
}
public static void main(String[] args) {
try {
DatabaseDescriptor.getPartitioner();
} catch (ExceptionInInitializerError e) {
e.printStackTrace(); // Caused by: NPE in SystemKeyspace.<clinit>
}
}
}{code}
To bypass this, tried a quick and dirty patch locally that doesn't trigger the
early class loading.
* Basically replaced `{_}*.addAll(SystemKeyspace.TABLE_NAMES)*{_}` with
individual `{_}*.add(SystemKeyspace.BATCHES)*{_}`, etc.
* String literals and `{+}_static final String_{+}` fields initialized to
literals are compile-time constants, so accessing them never triggers
`{+}_<clinit>_{+}`.
* *{+}Downside{+}:* verbose; requires manual sync when new system tables are
added.
---
+*Alternate Fix*+ — declare the size as a `{+}_static final int_{+}`. The
converter only calls `{_}*.size()*{_}` — it never iterates the set.
A `{*}_static final int_{*}` is always a compile-time constant.
{code:java}
// SchemaConstants.java
public static final int LOCAL_AND_REPLICATED_SYSTEM_TABLE_COUNT =
25 // SystemKeyspace
+ 11 // SchemaKeyspace
+ 2 // TraceKeyspace
+ 5 // AuthKeyspace
+ 7 // SystemDistributedKeyspace
+ 2; // AccordKeyspace{code}
*{+}Downside{+}:* Again needs to be kept in-sync, if system table count changes.
{code:java}
// Converters.java
TABLE_COUNT_THRESHOLD_TO_GUARDRAIL(int.class, int.class,
i -> i - SchemaConstants.LOCAL_AND_REPLICATED_SYSTEM_TABLE_COUNT,
o -> o == null ? null : o +
SchemaConstants.LOCAL_AND_REPLICATED_SYSTEM_TABLE_COUNT);{code}
`{*}_getLocalAndReplicatedSystemTableNames()_{*}` has no other callers apart
from the _*Converters.TABLE_COUNT_THRESHOLD_TO_GUARDRAIL*_ enum.
> Static init race between paxos v2 and table count guardrail causes NPE on
> startup
> ---------------------------------------------------------------------------------
>
> Key: CASSANDRA-21156
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21156
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Local/Startup and Shutdown
> Reporter: Blake Eggleston
> Priority: Normal
> Fix For: 5.0.x
>
>
> Set the following config values:
> table_count_warn_threshold: 400
> causes this exception on startup:
> {{ERROR [main] 2026-02-02 15:52:22,389 CassandraDaemon.java:887 - Exception
> encountered during startup
> java.lang.ExceptionInInitializerError: null
> at
> org.apache.cassandra.db.SystemKeyspace.<clinit>(SystemKeyspace.java:239)
> at
> org.apache.cassandra.schema.SchemaConstants.getLocalAndReplicatedSystemTableNames(SchemaConstants.java:184)
> at
> org.apache.cassandra.config.Converters.lambda$static$32(Converters.java:128)
> at org.apache.cassandra.config.Converters.convert(Converters.java:174)
> at org.apache.cassandra.config.Replacement$1.set(Replacement.java:76)
> at
> org.apache.cassandra.config.YamlConfigurationLoader$PropertiesChecker$1.set(YamlConfigurationLoader.java:376)
> at
> org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:276)
> at
> org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:169)
> at
> org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:320)
> at
> org.yaml.snakeyaml.constructor.BaseConstructor.constructObjectNoCheck(BaseConstructor.java:264)
> at
> org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:247)
> at
> org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:201)
> at
> org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:185)
> at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:493)
> at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:486)
> at
> org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:310)
> at
> org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:141)
> at
> org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:116)
> at
> org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:399)
> at
> org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:261)
> at
> org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:246)
> at
> org.apache.cassandra.service.CassandraDaemon.applyConfig(CassandraDaemon.java:780)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:723)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:865)
> Caused by: java.lang.NullPointerException: null
> at org.apache.cassandra.db.DataRange.allData(DataRange.java:71)
> at
> org.apache.cassandra.service.paxos.uncommitted.PaxosUncommittedIndex.<clinit>(PaxosUncommittedIndex.java:90)
> ... 24 common frames omitted}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]