[
https://issues.apache.org/jira/browse/HDFS-17860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ConfX updated HDFS-17860:
-------------------------
Description:
h2. Overview
A NullPointerException occurs in `BlocksMap.numNodes()` when calling
`getBlockLocations()` on a file that has been created via `concat()` operation,
after a NameNode restart. This is a critical production bug that could crash
the NameNode during normal file read operations.
h2. Reproduction
h3. Quick Start with Bash Script
Use the provided `reproduce.sh` script to automatically reproduce this bug:
This script automates the entire reproduction process by cloning Hadoop 3.3.5,
applying the test patch, building the project, and running the failing test
case.
{code:java}
# Make sure you download both reproduce.sh and restart.patch
$ ./reproduce.sh{code}
*What the script does:*
1. Clones the Hadoop repository (release 3.3.5 branch)
2. Applies the test patch (`restart.patch`) that adds the reproduction test
3. Builds the Hadoop HDFS module
4. Runs the test case `TestHDFSConcat#testConcatWithRestart` which demonstrates
the NullPointerException
The bug is confirmed if the test fails with a `NullPointerException` in
`BlocksMap.numNodes()`.
*Manual Reproduction Steps:*
If you prefer to run the test manually:
{code:java}
mvn surefire:test
-Dtest=TestHDFSConcat_RestartInjected#testConcat_AfterConcat_NN_Crash{code}
*Test Scenario*
1. Create a target file (`/trg`) with 3 blocks (512 bytes each)
2. Create 10 source files, each with 3 blocks
3. Call `dfs.concat(trgPath, files)` to concatenate all source files into target
4. Restart the NameNode
5. Call `nn.getBlockLocations(trg, 0, trgLen)` on the concatenated file
6. NPE occurs at `BlocksMap.numNodes()`
*Stack Trace*
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.numNodes(BlocksMap.java:172)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1420)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1382)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:1353)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1503)
at
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:769){code}
h2. Root Cause Analysis
Code Location
File: `BlocksMap.java:172`
{code:java}
int numNodes(Block b) {
BlockInfo info = blocks.get(b); // LINE 172 - NPE HERE
return info == null ? 0 : info.numNodes();
} {code}
*Why the NPE Occurs*
The NPE happens because a null `Block` parameter `b` is being passed to
`numNodes()`. When `b` is null, calling `blocks.get(null)` throws a
`NullPointerException` because:
1. The `blocks` map is likely a `LightWeightGSet` or similar hash map
implementation
2. The `get()` method calls `b.hashCode()` for lookup
3. Calling `hashCode()` on null throws NPE
*How Null Blocks Enter the System*
*Debug Investigation*
I added debug logging to `BlockManager.createLocatedBlockList()` to inspect the
blocks array:
{code:java}
LOG.info("RESTART_DEBUG: createLocatedBlockList called with blocks.length=" +
blocks.length);
for (int i = 0; i < blocks.length; i++) {
if (blocks[i] == null) {
LOG.error("RESTART_DEBUG: blocks[" + i + "] is NULL!");
}
}{code}
{*}Finding{*}: The blocks array itself does NOT contain null elements after
restart. All 33 blocks in the concatenated file's blocks array are non-null
BlockInfo objects.
Hypothesis: Stale Block References
After concat, the file's INode contains blocks from the original target file
PLUS all blocks from the concatenated source files. The test logs show:
- Before restart: Each source file has 3 blocks (10 files × 3 = 30 blocks)
- After concat: Target file should have 3 + 30 = 33 blocks
- After restart: `blocks.length=33` - all blocks present
However, based on the stack trace and NPE location, the issue likely stems from:
1. Concat operation moves blocks from source files to target file
2. Source files are deleted after concat
3. Block metadata for source files may be marked for deletion/invalidation
4. After restart, when NameNode reloads FSImage/EditLog:
- The concatenated file's INode correctly references all 33 blocks
- BUT some blocks may have been removed from the `BlocksMap` during fsimage
load
- Or blocks from deleted source files were never added to BlocksMap
5. When `createLocatedBlock()` is called with one of these "ghost" block
references:
- The BlockInfo object exists in the file's blocks array
- But the actual Block/BlockInfo lookup in BlocksMap fails or returns
inconsistent state
- A null block reference propagates to `numNodes()`
h3. Evidence
Test Log Analysis
*Before Restart (all blocks non-null):*
{code:java}
2025-12-01 12:44:06,136 [Time-limited test] INFO BlockManager -
RESTART_DEBUG: createLocatedBlockList called with blocks.length=3, offset=0,
length=1536
blocks[0] = blk_1073741825_1001, numBytes=512
blocks[1] = blk_1073741826_1002, numBytes=512
blocks[2] = blk_1073741827_1003, numBytes=512{code}
*After Restart (file now has 33 blocks from concat, all non-null):*
{code:java}
2025-12-01 12:44:15,034 [IPC Server handler 0] INFO BlockManager -
RESTART_DEBUG: createLocatedBlockList called with blocks.length=33, offset=0,
length=5120
blocks[0] = blk_1073741825_1001, numBytes=512
blocks[1] = blk_1073741826_1002, numBytes=512
blocks[2] = blk_1073741827_1003, numBytes=512
blocks[3] = blk_1073741855_1031, numBytes=512
... (all 33 blocks non-null)
blocks[32] = blk_1073741830_1006, numBytes=512{code}
*Then NPE occurs* - suggesting the null block comes from a code path not
instrumented by my debug logging, or there's a race condition during BlocksMap
access.
h3. *Likely Bug Location*
*Most likely: FSNamesystem reads the stale reference*
After the NameNode restarts, the old FSNamesystem in the test code reads the
stale reference and causes the problem.
If this is the cause, then we should either prevent stale reference (e.g.,
create a new FSNamesystem) and do a better error message instead of just a NPE
thrown.
*Suspect Area 2: Concat Implementation*
The concat operation (in `FSNamesystem` or `FSDirectory`) may not properly
handle block ownership transfer during the transaction that gets persisted to
edit log. On restart:
- FSImage loading might not correctly restore all blocks to BlocksMap
- Blocks from deleted source files might be in a transitional state
*Suspect Area 3: BlockInfo Reference vs BlocksMap Inconsistency*
There may be a race or ordering issue where:
1. INodeFile's blocks array references BlockInfo objects
2. These BlockInfo objects are not yet added to (or have been removed from) the
global BlocksMap
3. When `createLocatedBlock()` tries to look up block locations, it accesses a
BlockInfo that's not in the map
*Code Path to NPE*
{code:java}
getBlockLocations()
→ createLocatedBlocks()
→ createLocatedBlockList()
→ createLocatedBlock(blocks[curBlk], ...) // blocks[curBlk] might be
problematic
→ createLocatedBlock(blk, ...)
→ blocksMap.numNodes(blk) // NPE if blk is somehow null or
invalid{code}
h2. Impact
Production Impact:
1. NameNode Crash Risk: NPE in RPC handler can crash the NameNode during client
`getBlockLocations()` calls
2. Data Availability: Files created via `concat()` become unreadable after
NameNode restart
3. Silent Corruption: The concat operation appears to succeed, but the file is
broken after restart
Affected Operations
- Any `getBlockLocations()` call on concatenated files after restart
- File reads (since clients call getBlockLocations)
- MapReduce/Spark jobs reading concatenated files
- Backup/replication tools accessing these files
h2. *Recommended Fix*
*Immediate Mitigation*
Add null-safety check in `BlocksMap.numNodes()`:
{code:java}
int numNodes(Block b) {
if (b == null) {
LOG.error("Null block passed to numNodes()!", new Exception("Stack trace"));
return 0; // Or throw IOException
}
BlockInfo info = blocks.get(b);
return info == null ? 0 : info.numNodes();
} {code}
And in `BlockManager.createLocatedBlock()`:
{code:java}
private LocatedBlock createLocatedBlock(LocatedBlockBuilder locatedBlocks,
final BlockInfo blk, final long pos, final AccessMode mode)
throws IOException {
if (blk == null) {
LOG.error("Null block in createLocatedBlock at pos=" + pos, new
Exception());
throw new IOException("Null block reference in file's block list");
}
// ... rest of method
} {code}
*Root Cause Fix*
Requires deeper investigation:
1. Audit concat implementation to ensure all blocks are properly:
- Added to target file's INode
- Registered in BlocksMap
- Persisted correctly in edit log
- Loaded correctly from FSImage/EditLog on restart
2. Check FSImage/EditLog loading for concat transactions:
- Verify blocks from concatenated files are added to BlocksMap
- Ensure proper ordering of operations during replay
- Check for race conditions in block map population
3. Add consistency checks during NameNode startup:
- Verify all blocks referenced by INodes exist in BlocksMap
- Log warnings for orphaned block references
- Option to auto-repair or fail-safe mode
*Related Code Files to Investigate*
1. `BlocksMap.java` - Block storage map
2. `BlockManager.java` - Block management and location services
3. `FSNamesystem.java` / `FSDirConcatOp.java` - Concat operation implementation
4. `FSImageFormat.java` / `FSEditLog.java` - Persistence and loading
5. `INodeFile.java` - File inode and block array management
I'm more than happy to discuss the potential root cause and fix!
was:
h2. Overview
A NullPointerException occurs in `BlocksMap.numNodes()` when calling
`getBlockLocations()` on a file that has been created via `concat()` operation,
after a NameNode restart. This is a critical production bug that could crash
the NameNode during normal file read operations.
h2. Reproduction
h3. Quick Start with Bash Script
Use the provided `reproduce.sh` script to automatically reproduce this bug:
This script automates the entire reproduction process by cloning Hadoop 3.3.5,
applying the test patch, building the project, and running the failing test
case.
{code:java}
# Make sure you download both reproduce.sh and restart.patch
$ ./reproduce.sh{code}
*What the script does:*
1. Clones the Hadoop repository (release 3.3.5 branch)
2. Applies the test patch (`restart.patch`) that adds the reproduction test
3. Builds the Hadoop HDFS module
4. Runs the test case `TestHDFSConcat#testConcatWithRestart` which demonstrates
the NullPointerException
The bug is confirmed if the test fails with a `NullPointerException` in
`BlocksMap.numNodes()`.
*Manual Reproduction Steps:*
If you prefer to run the test manually:
{code:java}
mvn surefire:test
-Dtest=TestHDFSConcat_RestartInjected#testConcat_AfterConcat_NN_Crash{code}
*Test Scenario*
1. Create a target file (`/trg`) with 3 blocks (512 bytes each)
2. Create 10 source files, each with 3 blocks
3. Call `dfs.concat(trgPath, files)` to concatenate all source files into target
4. Restart the NameNode
5. Call `nn.getBlockLocations(trg, 0, trgLen)` on the concatenated file
6. NPE occurs at `BlocksMap.numNodes()`
*Stack Trace*
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.numNodes(BlocksMap.java:172)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1420)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1382)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:1353)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1503)
at
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:769){code}
h2. Root Cause Analysis
Code Location
File: `BlocksMap.java:172`
{code:java}
int numNodes(Block b) {
BlockInfo info = blocks.get(b); // LINE 172 - NPE HERE
return info == null ? 0 : info.numNodes();
} {code}
*Why the NPE Occurs*
The NPE happens because a null `Block` parameter `b` is being passed to
`numNodes()`. When `b` is null, calling `blocks.get(null)` throws a
`NullPointerException` because:
1. The `blocks` map is likely a `LightWeightGSet` or similar hash map
implementation
2. The `get()` method calls `b.hashCode()` for lookup
3. Calling `hashCode()` on null throws NPE
*How Null Blocks Enter the System*
*Debug Investigation*
I added debug logging to `BlockManager.createLocatedBlockList()` to inspect the
blocks array:
{code:java}
LOG.info("RESTART_DEBUG: createLocatedBlockList called with blocks.length=" +
blocks.length);
for (int i = 0; i < blocks.length; i++) {
if (blocks[i] == null) {
LOG.error("RESTART_DEBUG: blocks[" + i + "] is NULL!");
}
}{code}
{*}Finding{*}: The blocks array itself does NOT contain null elements after
restart. All 33 blocks in the concatenated file's blocks array are non-null
BlockInfo objects.
Hypothesis: Stale Block References
After concat, the file's INode contains blocks from the original target file
PLUS all blocks from the concatenated source files. The test logs show:
- Before restart: Each source file has 3 blocks (10 files × 3 = 30 blocks)
- After concat: Target file should have 3 + 30 = 33 blocks
- After restart: `blocks.length=33` - all blocks present
However, based on the stack trace and NPE location, the issue likely stems from:
1. Concat operation moves blocks from source files to target file
2. Source files are deleted after concat
3. Block metadata for source files may be marked for deletion/invalidation
4. After restart, when NameNode reloads FSImage/EditLog:
- The concatenated file's INode correctly references all 33 blocks
- BUT some blocks may have been removed from the `BlocksMap` during fsimage
load
- Or blocks from deleted source files were never added to BlocksMap
5. When `createLocatedBlock()` is called with one of these "ghost" block
references:
- The BlockInfo object exists in the file's blocks array
- But the actual Block/BlockInfo lookup in BlocksMap fails or returns
inconsistent state
- A null block reference propagates to `numNodes()`
h3. Evidence
Test Log Analysis
*Before Restart (all blocks non-null):*
{code:java}
2025-12-01 12:44:06,136 [Time-limited test] INFO BlockManager -
RESTART_DEBUG: createLocatedBlockList called with blocks.length=3, offset=0,
length=1536
blocks[0] = blk_1073741825_1001, numBytes=512
blocks[1] = blk_1073741826_1002, numBytes=512
blocks[2] = blk_1073741827_1003, numBytes=512{code}
*After Restart (file now has 33 blocks from concat, all non-null):*
{code:java}
2025-12-01 12:44:15,034 [IPC Server handler 0] INFO BlockManager -
RESTART_DEBUG: createLocatedBlockList called with blocks.length=33, offset=0,
length=5120
blocks[0] = blk_1073741825_1001, numBytes=512
blocks[1] = blk_1073741826_1002, numBytes=512
blocks[2] = blk_1073741827_1003, numBytes=512
blocks[3] = blk_1073741855_1031, numBytes=512
... (all 33 blocks non-null)
blocks[32] = blk_1073741830_1006, numBytes=512{code}
*Then NPE occurs* - suggesting the null block comes from a code path not
instrumented by my debug logging, or there's a race condition during BlocksMap
access.
h3. *Likely Bug Location*
*Most likely: FSNamesystem reads the stale reference*
After the NameNode restarts, the old FSNamesystem in the test code reads the
stale reference and causes the problem.
If this is the cause, then we should either prevent stale reference (e.g.,
create a new FSNamesystem) and do a better error message instead of just a NPE
thrown.
*Suspect Area 1: Concat Implementation*
The concat operation (in `FSNamesystem` or `FSDirectory`) may not properly
handle block ownership transfer during the transaction that gets persisted to
edit log. On restart:
- FSImage loading might not correctly restore all blocks to BlocksMap
- Blocks from deleted source files might be in a transitional state
*Suspect Area 2: BlockInfo Reference vs BlocksMap Inconsistency*
There may be a race or ordering issue where:
1. INodeFile's blocks array references BlockInfo objects
2. These BlockInfo objects are not yet added to (or have been removed from) the
global BlocksMap
3. When `createLocatedBlock()` tries to look up block locations, it accesses a
BlockInfo that's not in the map
*Code Path to NPE*
{code:java}
getBlockLocations()
→ createLocatedBlocks()
→ createLocatedBlockList()
→ createLocatedBlock(blocks[curBlk], ...) // blocks[curBlk] might be
problematic
→ createLocatedBlock(blk, ...)
→ blocksMap.numNodes(blk) // NPE if blk is somehow null or
invalid{code}
h2. Impact
Production Impact:
1. NameNode Crash Risk: NPE in RPC handler can crash the NameNode during client
`getBlockLocations()` calls
2. Data Availability: Files created via `concat()` become unreadable after
NameNode restart
3. Silent Corruption: The concat operation appears to succeed, but the file is
broken after restart
Affected Operations
- Any `getBlockLocations()` call on concatenated files after restart
- File reads (since clients call getBlockLocations)
- MapReduce/Spark jobs reading concatenated files
- Backup/replication tools accessing these files
h2. *Recommended Fix*
*Immediate Mitigation*
Add null-safety check in `BlocksMap.numNodes()`:
{code:java}
int numNodes(Block b) {
if (b == null) {
LOG.error("Null block passed to numNodes()!", new Exception("Stack trace"));
return 0; // Or throw IOException
}
BlockInfo info = blocks.get(b);
return info == null ? 0 : info.numNodes();
} {code}
And in `BlockManager.createLocatedBlock()`:
{code:java}
private LocatedBlock createLocatedBlock(LocatedBlockBuilder locatedBlocks,
final BlockInfo blk, final long pos, final AccessMode mode)
throws IOException {
if (blk == null) {
LOG.error("Null block in createLocatedBlock at pos=" + pos, new
Exception());
throw new IOException("Null block reference in file's block list");
}
// ... rest of method
} {code}
*Root Cause Fix*
Requires deeper investigation:
1. Audit concat implementation to ensure all blocks are properly:
- Added to target file's INode
- Registered in BlocksMap
- Persisted correctly in edit log
- Loaded correctly from FSImage/EditLog on restart
2. Check FSImage/EditLog loading for concat transactions:
- Verify blocks from concatenated files are added to BlocksMap
- Ensure proper ordering of operations during replay
- Check for race conditions in block map population
3. Add consistency checks during NameNode startup:
- Verify all blocks referenced by INodes exist in BlocksMap
- Log warnings for orphaned block references
- Option to auto-repair or fail-safe mode
*Related Code Files to Investigate*
1. `BlocksMap.java` - Block storage map
2. `BlockManager.java` - Block management and location services
3. `FSNamesystem.java` / `FSDirConcatOp.java` - Concat operation implementation
4. `FSImageFormat.java` / `FSEditLog.java` - Persistence and loading
5. `INodeFile.java` - File inode and block array management
I'm more than happy to discuss the potential root cause and fix!
> NPE in BlocksMap After NameNode Concat and Restart
> --------------------------------------------------
>
> Key: HDFS-17860
> URL: https://issues.apache.org/jira/browse/HDFS-17860
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.3.5
> Reporter: ConfX
> Priority: Critical
> Labels: ai-tooling
> Attachments: reproduce.sh, restart.patch
>
>
> h2. Overview
> A NullPointerException occurs in `BlocksMap.numNodes()` when calling
> `getBlockLocations()` on a file that has been created via `concat()`
> operation, after a NameNode restart. This is a critical production bug that
> could crash the NameNode during normal file read operations.
> h2. Reproduction
> h3. Quick Start with Bash Script
> Use the provided `reproduce.sh` script to automatically reproduce this bug:
> This script automates the entire reproduction process by cloning Hadoop
> 3.3.5, applying the test patch, building the project, and running the failing
> test case.
>
> {code:java}
> # Make sure you download both reproduce.sh and restart.patch
> $ ./reproduce.sh{code}
>
> *What the script does:*
> 1. Clones the Hadoop repository (release 3.3.5 branch)
> 2. Applies the test patch (`restart.patch`) that adds the reproduction test
> 3. Builds the Hadoop HDFS module
> 4. Runs the test case `TestHDFSConcat#testConcatWithRestart` which
> demonstrates the NullPointerException
>
> The bug is confirmed if the test fails with a `NullPointerException` in
> `BlocksMap.numNodes()`.
> *Manual Reproduction Steps:*
> If you prefer to run the test manually:
> {code:java}
> mvn surefire:test
> -Dtest=TestHDFSConcat_RestartInjected#testConcat_AfterConcat_NN_Crash{code}
> *Test Scenario*
> 1. Create a target file (`/trg`) with 3 blocks (512 bytes each)
> 2. Create 10 source files, each with 3 blocks
> 3. Call `dfs.concat(trgPath, files)` to concatenate all source files into
> target
> 4. Restart the NameNode
> 5. Call `nn.getBlockLocations(trg, 0, trgLen)` on the concatenated file
> 6. NPE occurs at `BlocksMap.numNodes()`
>
> *Stack Trace*
> {code:java}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.numNodes(BlocksMap.java:172)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1420)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:1382)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:1353)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:1503)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:179)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2124)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:769){code}
> h2. Root Cause Analysis
> Code Location
> File: `BlocksMap.java:172`
> {code:java}
> int numNodes(Block b) {
> BlockInfo info = blocks.get(b); // LINE 172 - NPE HERE
> return info == null ? 0 : info.numNodes();
> } {code}
> *Why the NPE Occurs*
> The NPE happens because a null `Block` parameter `b` is being passed to
> `numNodes()`. When `b` is null, calling `blocks.get(null)` throws a
> `NullPointerException` because:
> 1. The `blocks` map is likely a `LightWeightGSet` or similar hash map
> implementation
> 2. The `get()` method calls `b.hashCode()` for lookup
> 3. Calling `hashCode()` on null throws NPE
>
> *How Null Blocks Enter the System*
> *Debug Investigation*
> I added debug logging to `BlockManager.createLocatedBlockList()` to inspect
> the blocks array:
> {code:java}
> LOG.info("RESTART_DEBUG: createLocatedBlockList called with blocks.length=" +
> blocks.length);
> for (int i = 0; i < blocks.length; i++) {
> if (blocks[i] == null) {
> LOG.error("RESTART_DEBUG: blocks[" + i + "] is NULL!");
> }
> }{code}
> {*}Finding{*}: The blocks array itself does NOT contain null elements after
> restart. All 33 blocks in the concatenated file's blocks array are non-null
> BlockInfo objects.
>
> Hypothesis: Stale Block References
>
> After concat, the file's INode contains blocks from the original target file
> PLUS all blocks from the concatenated source files. The test logs show:
> - Before restart: Each source file has 3 blocks (10 files × 3 = 30 blocks)
> - After concat: Target file should have 3 + 30 = 33 blocks
> - After restart: `blocks.length=33` - all blocks present
> However, based on the stack trace and NPE location, the issue likely stems
> from:
>
> 1. Concat operation moves blocks from source files to target file
> 2. Source files are deleted after concat
> 3. Block metadata for source files may be marked for deletion/invalidation
> 4. After restart, when NameNode reloads FSImage/EditLog:
> - The concatenated file's INode correctly references all 33 blocks
> - BUT some blocks may have been removed from the `BlocksMap` during
> fsimage load
> - Or blocks from deleted source files were never added to BlocksMap
> 5. When `createLocatedBlock()` is called with one of these "ghost" block
> references:
> - The BlockInfo object exists in the file's blocks array
> - But the actual Block/BlockInfo lookup in BlocksMap fails or returns
> inconsistent state
> - A null block reference propagates to `numNodes()`
> h3. Evidence
> Test Log Analysis
>
> *Before Restart (all blocks non-null):*
> {code:java}
> 2025-12-01 12:44:06,136 [Time-limited test] INFO BlockManager -
> RESTART_DEBUG: createLocatedBlockList called with blocks.length=3,
> offset=0, length=1536
> blocks[0] = blk_1073741825_1001, numBytes=512
> blocks[1] = blk_1073741826_1002, numBytes=512
> blocks[2] = blk_1073741827_1003, numBytes=512{code}
>
> *After Restart (file now has 33 blocks from concat, all non-null):*
> {code:java}
> 2025-12-01 12:44:15,034 [IPC Server handler 0] INFO BlockManager -
> RESTART_DEBUG: createLocatedBlockList called with blocks.length=33,
> offset=0, length=5120
> blocks[0] = blk_1073741825_1001, numBytes=512
> blocks[1] = blk_1073741826_1002, numBytes=512
> blocks[2] = blk_1073741827_1003, numBytes=512
> blocks[3] = blk_1073741855_1031, numBytes=512
> ... (all 33 blocks non-null)
> blocks[32] = blk_1073741830_1006, numBytes=512{code}
>
> *Then NPE occurs* - suggesting the null block comes from a code path not
> instrumented by my debug logging, or there's a race condition during
> BlocksMap access.
> h3. *Likely Bug Location*
> *Most likely: FSNamesystem reads the stale reference*
> After the NameNode restarts, the old FSNamesystem in the test code reads the
> stale reference and causes the problem.
> If this is the cause, then we should either prevent stale reference (e.g.,
> create a new FSNamesystem) and do a better error message instead of just a
> NPE thrown.
> *Suspect Area 2: Concat Implementation*
>
> The concat operation (in `FSNamesystem` or `FSDirectory`) may not properly
> handle block ownership transfer during the transaction that gets persisted to
> edit log. On restart:
> - FSImage loading might not correctly restore all blocks to BlocksMap
> - Blocks from deleted source files might be in a transitional state
> *Suspect Area 3: BlockInfo Reference vs BlocksMap Inconsistency*
>
> There may be a race or ordering issue where:
> 1. INodeFile's blocks array references BlockInfo objects
> 2. These BlockInfo objects are not yet added to (or have been removed from)
> the global BlocksMap
> 3. When `createLocatedBlock()` tries to look up block locations, it accesses
> a BlockInfo that's not in the map
>
> *Code Path to NPE*
> {code:java}
> getBlockLocations()
> → createLocatedBlocks()
> → createLocatedBlockList()
> → createLocatedBlock(blocks[curBlk], ...) // blocks[curBlk] might be
> problematic
> → createLocatedBlock(blk, ...)
> → blocksMap.numNodes(blk) // NPE if blk is somehow null or
> invalid{code}
> h2. Impact
> Production Impact:
> 1. NameNode Crash Risk: NPE in RPC handler can crash the NameNode during
> client `getBlockLocations()` calls
> 2. Data Availability: Files created via `concat()` become unreadable after
> NameNode restart
> 3. Silent Corruption: The concat operation appears to succeed, but the file
> is broken after restart
>
> Affected Operations
> - Any `getBlockLocations()` call on concatenated files after restart
> - File reads (since clients call getBlockLocations)
> - MapReduce/Spark jobs reading concatenated files
> - Backup/replication tools accessing these files
> h2. *Recommended Fix*
> *Immediate Mitigation*
> Add null-safety check in `BlocksMap.numNodes()`:
> {code:java}
> int numNodes(Block b) {
> if (b == null) {
> LOG.error("Null block passed to numNodes()!", new Exception("Stack
> trace"));
> return 0; // Or throw IOException
> }
> BlockInfo info = blocks.get(b);
> return info == null ? 0 : info.numNodes();
> } {code}
> And in `BlockManager.createLocatedBlock()`:
> {code:java}
> private LocatedBlock createLocatedBlock(LocatedBlockBuilder locatedBlocks,
> final BlockInfo blk, final long pos, final AccessMode mode)
> throws IOException {
> if (blk == null) {
> LOG.error("Null block in createLocatedBlock at pos=" + pos, new
> Exception());
> throw new IOException("Null block reference in file's block list");
> }
> // ... rest of method
> } {code}
>
> *Root Cause Fix*
>
> Requires deeper investigation:
>
> 1. Audit concat implementation to ensure all blocks are properly:
> - Added to target file's INode
> - Registered in BlocksMap
> - Persisted correctly in edit log
> - Loaded correctly from FSImage/EditLog on restart
>
> 2. Check FSImage/EditLog loading for concat transactions:
> - Verify blocks from concatenated files are added to BlocksMap
> - Ensure proper ordering of operations during replay
> - Check for race conditions in block map population
>
> 3. Add consistency checks during NameNode startup:
> - Verify all blocks referenced by INodes exist in BlocksMap
> - Log warnings for orphaned block references
> - Option to auto-repair or fail-safe mode
>
>
> *Related Code Files to Investigate*
> 1. `BlocksMap.java` - Block storage map
> 2. `BlockManager.java` - Block management and location services
> 3. `FSNamesystem.java` / `FSDirConcatOp.java` - Concat operation
> implementation
> 4. `FSImageFormat.java` / `FSEditLog.java` - Persistence and loading
> 5. `INodeFile.java` - File inode and block array management
>
> I'm more than happy to discuss the potential root cause and fix!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]