[ 
https://issues.apache.org/jira/browse/CASSANDRA-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ConfX updated CASSANDRA-21128:
------------------------------
    Description: 
`CompactionManager.submitMaximal()` does not handle the case where 
`getMaximalTasks()` returns `null`, leading to a `NullPointerException` when 
calling `tasks.isEmpty()`.
 
This would happen during a node restart:
1. An async compaction task may be executing `submitMaximal()`
2. The restart causes conditions where there are uninterruptible compactions 
(e.g., compactions that couldn't be properly cancelled during shutdown)
3. `runWithCompactionsDisabled()` finds these uninterruptible tasks and returns 
`null`
4. `submitMaximal()` doesn't check for `null`, resulting in NPE
 
{code:java}
Timeline (NPE occurs):
─────────────────────────────────────────────────────────────────────────────
[Async Task]     submitMaximal() ──> getMaximalTasks() ──> 
runWithCompactionsDisabled()
                                                                    │
[Main Thread]    upgradesstables 
──────────────────────────────────────────────────────
                                                                    │
[Restart]        ─────────────────────────────────────────── RESTART TRIGGERED
                                                                    │
                                                           Shutdown interrupts
                                                           compactions, but some
                                                           remain 
"uninterruptible"
                                                                    │
                                                           
runWithCompactionsDisabled()
                                                           returns NULL
                                                                    │
                                                           tasks.isEmpty() → 
NPE! {code}
{code:java}
java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot 
invoke "org.apache.cassandra.db.compaction.CompactionTasks.isEmpty()" because 
"tasks" is null
    at 
org.apache.cassandra.utils.concurrent.AbstractFuture.getWhenDone(AbstractFuture.java:239)
    at 
org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:246)
    at 
...UpgradeSSTablesTest_RestartInjected.upgradeSSTablesInterruptsOngoingCompaction(line:118)
    ...
Caused by: java.lang.NullPointerException: Cannot invoke 
"org.apache.cassandra.db.compaction.CompactionTasks.isEmpty()" because "tasks" 
is null
    at 
org.apache.cassandra.db.compaction.CompactionManager.submitMaximal(CompactionManager.java:1001)
    at 
...UpgradeSSTablesTest_RestartInjected.lambda$...upgradeSSTablesInterruptsOngoingCompaction$...(line:103)
 {code}
Buggy part:
{code:java}
public List<Future<?>> submitMaximal(final ColumnFamilyStore cfStore, final 
long gcBefore, boolean splitOutput, OperationType operationType)
{
    // here we compute the task off the compaction executor, so having that 
present doesn't
    // confuse runWithCompactionsDisabled -- i.e., we don't want to deadlock 
ourselves, waiting
    // for ourselves to finish/acknowledge cancellation before continuing.
    CompactionTasks tasks = 
cfStore.getCompactionStrategyManager().getMaximalTasks(gcBefore, splitOutput, 
operationType);    
    if (tasks.isEmpty())  // NPE here when tasks is null
        return Collections.emptyList();
    // ...
} {code}
h3. Why `getMaximalTasks()` Can Return Null

The `getMaximalTasks()` method in `CompactionStrategyManager` uses 
`runWithCompactionsDisabled()` internally:
{code:java}
// CompactionStrategyManager.java:1076-1100
public CompactionTasks getMaximalTasks(final long gcBefore, final boolean 
splitOutput, OperationType operationType)
{
    maybeReloadDiskBoundaries();
    return cfs.runWithCompactionsDisabled(() -> {
        // ... create tasks
        return CompactionTasks.create(tasks);
    }, operationType, false, false);
} {code}
The `runWithCompactionsDisabled()` method in `ColumnFamilyStore` can return 
`null` in two scenarios:
1. When there are uninterruptible higher-priority compactions 
(ColumnFamilyStore.java:2850-2858):
{code:java}
if (!uninterruptibleTasks.isEmpty())
{
    logger.info("Unable to cancel in-progress compactions, since they're 
running with higher or same priority: {}...");
    return null;  // Returns null here!
} {code}
2. When compaction cannot be interrupted due to timeout 
(ColumnFamilyStore.java:2868-2873):
{code:java}
if (cfs.getTracker().getCompacting().stream().anyMatch(sstablesPredicate))
{
    logger.warn("Unable to cancel in-progress compactions for {}...");
    return null;  // Returns null here too!
} {code}
h3.  
h3. Proposed Fix

I see there are indeed null checker for other usage. For example, in 
`PendingRepairHolder.getMaximalTasks()`:

 
{code:java}
// PendingRepairHolder.java:130-132
Collection<AbstractCompactionTask> task = manager.getMaximalTasks(gcBefore, 
splitOutput);
if (task != null)  // Proper null check!
    tasks.addAll(task); {code}
So the fix would just simply add a null check before calling isEmpty():
 

 
{code:java}
public List<Future<?>> submitMaximal(final ColumnFamilyStore cfStore, final 
long gcBefore, boolean splitOutput, OperationType operationType)
{
    CompactionTasks tasks = 
cfStore.getCompactionStrategyManager().getMaximalTasks(gcBefore, splitOutput, 
operationType);    
    if (tasks == null || tasks.isEmpty())  // Added null check
        return Collections.emptyList();    // ... rest of the method
} {code}
I'm happy to provide the PR for this issue.
 

 

  was:
`CompactionManager.submitMaximal()` does not handle the case where 
`getMaximalTasks()` returns `null`, leading to a `NullPointerException` when 
calling `tasks.isEmpty()`.
 
This would happen during a node restart:
1. An async compaction task may be executing `submitMaximal()`
2. The restart causes conditions where there are uninterruptible compactions 
(e.g., compactions that couldn't be properly cancelled during shutdown)
3. `runWithCompactionsDisabled()` finds these uninterruptible tasks and returns 
`null`
4. `submitMaximal()` doesn't check for `null`, resulting in NPE
 
{code:java}
Timeline A (NPE occurs):
─────────────────────────────────────────────────────────────────────────────
[Async Task]     submitMaximal() ──> getMaximalTasks() ──> 
runWithCompactionsDisabled()
                                                                    │
[Main Thread]    upgradesstables 
──────────────────────────────────────────────────────
                                                                    │
[Restart]        ─────────────────────────────────────────── RESTART TRIGGERED
                                                                    │
                                                           Shutdown interrupts
                                                           compactions, but some
                                                           remain 
"uninterruptible"
                                                                    │
                                                           
runWithCompactionsDisabled()
                                                           returns NULL
                                                                    │
                                                           tasks.isEmpty() → 
NPE! {code}
{code:java}
java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot 
invoke "org.apache.cassandra.db.compaction.CompactionTasks.isEmpty()" because 
"tasks" is null
    at 
org.apache.cassandra.utils.concurrent.AbstractFuture.getWhenDone(AbstractFuture.java:239)
    at 
org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:246)
    at 
...UpgradeSSTablesTest_RestartInjected.upgradeSSTablesInterruptsOngoingCompaction(line:118)
    ...
Caused by: java.lang.NullPointerException: Cannot invoke 
"org.apache.cassandra.db.compaction.CompactionTasks.isEmpty()" because "tasks" 
is null
    at 
org.apache.cassandra.db.compaction.CompactionManager.submitMaximal(CompactionManager.java:1001)
    at 
...UpgradeSSTablesTest_RestartInjected.lambda$...upgradeSSTablesInterruptsOngoingCompaction$...(line:103)
 {code}
Buggy part:
{code:java}
public List<Future<?>> submitMaximal(final ColumnFamilyStore cfStore, final 
long gcBefore, boolean splitOutput, OperationType operationType)
{
    // here we compute the task off the compaction executor, so having that 
present doesn't
    // confuse runWithCompactionsDisabled -- i.e., we don't want to deadlock 
ourselves, waiting
    // for ourselves to finish/acknowledge cancellation before continuing.
    CompactionTasks tasks = 
cfStore.getCompactionStrategyManager().getMaximalTasks(gcBefore, splitOutput, 
operationType);    
    if (tasks.isEmpty())  // NPE here when tasks is null
        return Collections.emptyList();
    // ...
} {code}
h3. Why `getMaximalTasks()` Can Return Null
The `getMaximalTasks()` method in `CompactionStrategyManager` uses 
`runWithCompactionsDisabled()` internally:
{code:java}
// CompactionStrategyManager.java:1076-1100
public CompactionTasks getMaximalTasks(final long gcBefore, final boolean 
splitOutput, OperationType operationType)
{
    maybeReloadDiskBoundaries();
    return cfs.runWithCompactionsDisabled(() -> {
        // ... create tasks
        return CompactionTasks.create(tasks);
    }, operationType, false, false);
} {code}
The `runWithCompactionsDisabled()` method in `ColumnFamilyStore` can return 
`null` in two scenarios:
1. When there are uninterruptible higher-priority compactions 
(ColumnFamilyStore.java:2850-2858):
{code:java}
if (!uninterruptibleTasks.isEmpty())
{
    logger.info("Unable to cancel in-progress compactions, since they're 
running with higher or same priority: {}...");
    return null;  // Returns null here!
} {code}
2. When compaction cannot be interrupted due to timeout 
(ColumnFamilyStore.java:2868-2873):
{code:java}
if (cfs.getTracker().getCompacting().stream().anyMatch(sstablesPredicate))
{
    logger.warn("Unable to cancel in-progress compactions for {}...");
    return null;  // Returns null here too!
} {code}
h3.  
h3. Proposed Fix

I see there are indeed null checker for other usage. For example, in 
`PendingRepairHolder.getMaximalTasks()`:

 
{code:java}
// PendingRepairHolder.java:130-132
Collection<AbstractCompactionTask> task = manager.getMaximalTasks(gcBefore, 
splitOutput);
if (task != null)  // Proper null check!
    tasks.addAll(task); {code}
So the fix would just simply add a null check before calling isEmpty():
 

 
{code:java}
public List<Future<?>> submitMaximal(final ColumnFamilyStore cfStore, final 
long gcBefore, boolean splitOutput, OperationType operationType)
{
    CompactionTasks tasks = 
cfStore.getCompactionStrategyManager().getMaximalTasks(gcBefore, splitOutput, 
operationType);    
    if (tasks == null || tasks.isEmpty())  // Added null check
        return Collections.emptyList();    // ... rest of the method
} {code}
I'm happy to provide the PR for this issue.
 

 


> CompactionManager.submitMaximal crash due to node restart race condition
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21128
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: ConfX
>            Priority: Normal
>
> `CompactionManager.submitMaximal()` does not handle the case where 
> `getMaximalTasks()` returns `null`, leading to a `NullPointerException` when 
> calling `tasks.isEmpty()`.
>  
> This would happen during a node restart:
> 1. An async compaction task may be executing `submitMaximal()`
> 2. The restart causes conditions where there are uninterruptible compactions 
> (e.g., compactions that couldn't be properly cancelled during shutdown)
> 3. `runWithCompactionsDisabled()` finds these uninterruptible tasks and 
> returns `null`
> 4. `submitMaximal()` doesn't check for `null`, resulting in NPE
>  
> {code:java}
> Timeline (NPE occurs):
> ─────────────────────────────────────────────────────────────────────────────
> [Async Task]     submitMaximal() ──> getMaximalTasks() ──> 
> runWithCompactionsDisabled()
>                                                                     │
> [Main Thread]    upgradesstables 
> ──────────────────────────────────────────────────────
>                                                                     │
> [Restart]        ─────────────────────────────────────────── RESTART TRIGGERED
>                                                                     │
>                                                            Shutdown interrupts
>                                                            compactions, but 
> some
>                                                            remain 
> "uninterruptible"
>                                                                     │
>                                                            
> runWithCompactionsDisabled()
>                                                            returns NULL
>                                                                     │
>                                                            tasks.isEmpty() → 
> NPE! {code}
> {code:java}
> java.util.concurrent.ExecutionException: java.lang.NullPointerException: 
> Cannot invoke "org.apache.cassandra.db.compaction.CompactionTasks.isEmpty()" 
> because "tasks" is null
>     at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.getWhenDone(AbstractFuture.java:239)
>     at 
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:246)
>     at 
> ...UpgradeSSTablesTest_RestartInjected.upgradeSSTablesInterruptsOngoingCompaction(line:118)
>     ...
> Caused by: java.lang.NullPointerException: Cannot invoke 
> "org.apache.cassandra.db.compaction.CompactionTasks.isEmpty()" because 
> "tasks" is null
>     at 
> org.apache.cassandra.db.compaction.CompactionManager.submitMaximal(CompactionManager.java:1001)
>     at 
> ...UpgradeSSTablesTest_RestartInjected.lambda$...upgradeSSTablesInterruptsOngoingCompaction$...(line:103)
>  {code}
> Buggy part:
> {code:java}
> public List<Future<?>> submitMaximal(final ColumnFamilyStore cfStore, final 
> long gcBefore, boolean splitOutput, OperationType operationType)
> {
>     // here we compute the task off the compaction executor, so having that 
> present doesn't
>     // confuse runWithCompactionsDisabled -- i.e., we don't want to deadlock 
> ourselves, waiting
>     // for ourselves to finish/acknowledge cancellation before continuing.
>     CompactionTasks tasks = 
> cfStore.getCompactionStrategyManager().getMaximalTasks(gcBefore, splitOutput, 
> operationType);    
>     if (tasks.isEmpty())  // NPE here when tasks is null
>         return Collections.emptyList();
>     // ...
> } {code}
> h3. Why `getMaximalTasks()` Can Return Null
> The `getMaximalTasks()` method in `CompactionStrategyManager` uses 
> `runWithCompactionsDisabled()` internally:
> {code:java}
> // CompactionStrategyManager.java:1076-1100
> public CompactionTasks getMaximalTasks(final long gcBefore, final boolean 
> splitOutput, OperationType operationType)
> {
>     maybeReloadDiskBoundaries();
>     return cfs.runWithCompactionsDisabled(() -> {
>         // ... create tasks
>         return CompactionTasks.create(tasks);
>     }, operationType, false, false);
> } {code}
> The `runWithCompactionsDisabled()` method in `ColumnFamilyStore` can return 
> `null` in two scenarios:
> 1. When there are uninterruptible higher-priority compactions 
> (ColumnFamilyStore.java:2850-2858):
> {code:java}
> if (!uninterruptibleTasks.isEmpty())
> {
>     logger.info("Unable to cancel in-progress compactions, since they're 
> running with higher or same priority: {}...");
>     return null;  // Returns null here!
> } {code}
> 2. When compaction cannot be interrupted due to timeout 
> (ColumnFamilyStore.java:2868-2873):
> {code:java}
> if (cfs.getTracker().getCompacting().stream().anyMatch(sstablesPredicate))
> {
>     logger.warn("Unable to cancel in-progress compactions for {}...");
>     return null;  // Returns null here too!
> } {code}
> h3.  
> h3. Proposed Fix
> I see there are indeed null checker for other usage. For example, in 
> `PendingRepairHolder.getMaximalTasks()`:
>  
> {code:java}
> // PendingRepairHolder.java:130-132
> Collection<AbstractCompactionTask> task = manager.getMaximalTasks(gcBefore, 
> splitOutput);
> if (task != null)  // Proper null check!
>     tasks.addAll(task); {code}
> So the fix would just simply add a null check before calling isEmpty():
>  
>  
> {code:java}
> public List<Future<?>> submitMaximal(final ColumnFamilyStore cfStore, final 
> long gcBefore, boolean splitOutput, OperationType operationType)
> {
>     CompactionTasks tasks = 
> cfStore.getCompactionStrategyManager().getMaximalTasks(gcBefore, splitOutput, 
> operationType);    
>     if (tasks == null || tasks.isEmpty())  // Added null check
>         return Collections.emptyList();    // ... rest of the method
> } {code}
> I'm happy to provide the PR for this issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to