[
https://issues.apache.org/jira/browse/HDDS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tak-Lon (Stephen) Wu updated HDDS-15422:
----------------------------------------
Description:
While testing HBase read only with Ozone by using YCSB workload C (L1
blockcache / no bucketcache /), We are experiencing degraded read throughput
and hanging handler threads on our HBase RegionServers. Despite a healthy
BlockCache hit rate (~87.5%), active read threads are becoming blocked,
creating a bottleneck.
Profiling reveals that handler threads are hanging IdLock.getLockEntry while
waiting for a single thread to fetch an HFile block from Apache Ozone.
and while tuning on TRACE/DEBUG log , we found the input stream are closing
immediately after seeking from end/middle of the file back to 0. the next seek
that tried to seek from 0 to middle hangs and wait till timeout. this
indicates that read amplification issue triggered by Reverse Scans over the
Ozone gRPC client cannot reuse the stream.
{code}
2026-05-29 12:50:06,171 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream119-reader3831: return response positon 179405298, length 4095502 (block
offset 179306496, length 4194304)
2026-05-29 12:50:06,171 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream119: seek
179471034 -> 0
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
stream119-reader3831
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream119-reader3831 setCompleted success
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream120: seek 0
-> 133955559
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-11]: stream119-reader3831 Failed to setCompleted:
Already completed
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek
217274129 -> 0
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek 0
-> 102738300
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position
102738300, length 65686, requested 102738300, diff 65686, readLength 65686,
preReadSize=0
2026-05-29 12:50:07,008 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-24]: stream492-reader3832: enqueue response offset
101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream492-reader3832: return response positon 102738300, length 3167876 (block
offset 101711872, length 4194304)
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek
102803986 -> 0
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
stream492-reader3832
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream492-reader3832 setCompleted success
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek 0
-> 217274129
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-24]: stream492-reader3832 Failed to setCompleted:
Already completed
{code}
please check [this snippet of code
|https://github.com/apache/ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java#L212]
in StreamBlockInputStream.java.
one thought is to reduce the read chunk size from e.g. 4MB or 1MB to 64KB to
match the hbase block, but this size reduction does not help the fact of
Reverse Scan for HFile reading cases.
{code}
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position
102738300, length 65686, requested 102738300, diff 65686, readLength 65686,
preReadSize=0
2026-05-29 12:50:07,008 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-24]: stream492-reader3832: enqueue response offset
101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
{code}
Setup configuration
{code}
ozone.fs.datastream.enabled = true
ozone.client.stream.readblock.enable = true
hdds.container.ratis.datastream.enabled = true
ozone.client.verify.checksum = false (was true and turned to false when
checksum failed)
hdds.grpc.tls.enabled = false (final)
ozone.client.stream.read.timeout = 120s
scm.container.client.idle.threshold = 120s
{code}
was:
While testing HBase read only with Ozone by using YCSB workload C (L1
blockcache / no bucketcache /), We are experiencing degraded read throughput
and hanging handler threads on our HBase RegionServers. Despite a healthy
BlockCache hit rate (~87.5%), active read threads are becoming blocked,
creating a bottleneck.
Profiling reveals that handler threads are hanging IdLock.getLockEntry while
waiting for a single thread to fetch an HFile block from Apache Ozone.
and while tuning on TRACE/DEBUG log , we found the input stream are closing
immediately after seeking from end/middle of the file back to 0. the next seek
that tried to seek from 0 to middle hangs and wait till timeout. this
indicates that read amplification issue triggered by Reverse Scans over the
Ozone gRPC client cannot reuse the stream.
{code}
2026-05-29 12:50:06,171 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream119-reader3831: return response positon 179405298, length 4095502 (block
offset 179306496, length 4194304)
2026-05-29 12:50:06,171 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream119: seek
179471034 -> 0
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
stream119-reader3831
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream119-reader3831 setCompleted success
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream120: seek 0
-> 133955559
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-11]: stream119-reader3831 Failed to setCompleted:
Already completed
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek
217274129 -> 0
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek 0
-> 102738300
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position
102738300, length 65686, requested 102738300, diff 65686, readLength 65686,
preReadSize=0
2026-05-29 12:50:07,008 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-24]: stream492-reader3832: enqueue response offset
101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream492-reader3832: return response positon 102738300, length 3167876 (block
offset 101711872, length 4194304)
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek
102803986 -> 0
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
stream492-reader3832
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream492-reader3832 setCompleted success
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek 0
-> 217274129
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-24]: stream492-reader3832 Failed to setCompleted:
Already completed
{code}
please check [this snippet of code
|https://github.com/apache/ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java#L212]
in StreamBlockInputStream.java.
one thought is to reduce the read chunk size from e.g. 4MB or 1MB to 64KB to
match the hbase block, but this size reduction does not help the fact of
Reverse Scan for HFile reading cases.
Setup configuration
{code}
ozone.fs.datastream.enabled = true
ozone.client.stream.readblock.enable = true
hdds.container.ratis.datastream.enabled = true
ozone.client.verify.checksum = false (was true and turned to false when
checksum failed)
hdds.grpc.tls.enabled = false (final)
ozone.client.stream.read.timeout = 120s
scm.container.client.idle.threshold = 120s
{code}
> Stream read with `seek` close stream during reverse scan may cause
> performance bottleneck
> ------------------------------------------------------------------------------------------
>
> Key: HDDS-15422
> URL: https://issues.apache.org/jira/browse/HDDS-15422
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Client
> Affects Versions: 2.2.0
> Reporter: Tak-Lon (Stephen) Wu
> Priority: Major
>
> While testing HBase read only with Ozone by using YCSB workload C (L1
> blockcache / no bucketcache /), We are experiencing degraded read throughput
> and hanging handler threads on our HBase RegionServers. Despite a healthy
> BlockCache hit rate (~87.5%), active read threads are becoming blocked,
> creating a bottleneck.
> Profiling reveals that handler threads are hanging IdLock.getLockEntry while
> waiting for a single thread to fetch an HFile block from Apache Ozone.
> and while tuning on TRACE/DEBUG log , we found the input stream are closing
> immediately after seeking from end/middle of the file back to 0. the next
> seek that tried to seek from 0 to middle hangs and wait till timeout. this
> indicates that read amplification issue triggered by Reverse Scans over the
> Ozone gRPC client cannot reuse the stream.
> {code}
> 2026-05-29 12:50:06,171 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream119-reader3831: return response positon 179405298, length 4095502
> (block offset 179306496, length 4194304)
> 2026-05-29 12:50:06,171 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream119: seek
> 179471034 -> 0
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
> stream119-reader3831
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream119-reader3831 setCompleted success
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream120: seek
> 0 -> 133955559
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [grpc-default-executor-11]: stream119-reader3831 Failed to setCompleted:
> Already completed
> 2026-05-29 12:50:06,985 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek
> 217274129 -> 0
> 2026-05-29 12:50:06,985 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek
> 0 -> 102738300
> 2026-05-29 12:50:06,985 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position
> 102738300, length 65686, requested 102738300, diff 65686, readLength 65686,
> preReadSize=0
> 2026-05-29 12:50:07,008 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [grpc-default-executor-24]: stream492-reader3832: enqueue response offset
> 101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream492-reader3832: return response positon 102738300, length 3167876
> (block offset 101711872, length 4194304)
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek
> 102803986 -> 0
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
> stream492-reader3832
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream492-reader3832 setCompleted success
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek
> 0 -> 217274129
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [grpc-default-executor-24]: stream492-reader3832 Failed to setCompleted:
> Already completed
> {code}
> please check [this snippet of code
> |https://github.com/apache/ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java#L212]
> in StreamBlockInputStream.java.
> one thought is to reduce the read chunk size from e.g. 4MB or 1MB to 64KB to
> match the hbase block, but this size reduction does not help the fact of
> Reverse Scan for HFile reading cases.
> {code}
> 2026-05-29 12:50:06,985 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position
> 102738300, length 65686, requested 102738300, diff 65686, readLength 65686,
> preReadSize=0
> 2026-05-29 12:50:07,008 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [grpc-default-executor-24]: stream492-reader3832: enqueue response offset
> 101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
> {code}
> Setup configuration
> {code}
> ozone.fs.datastream.enabled = true
> ozone.client.stream.readblock.enable = true
> hdds.container.ratis.datastream.enabled = true
> ozone.client.verify.checksum = false (was true and turned to false when
> checksum failed)
> hdds.grpc.tls.enabled = false (final)
> ozone.client.stream.read.timeout = 120s
> scm.container.client.idle.threshold = 120s
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]