[
https://issues.apache.org/jira/browse/HDDS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tak-Lon (Stephen) Wu updated HDDS-15422:
----------------------------------------
Summary: Stream read with `seek` close stream during reverse scan may cause
performance bottleneck (was: Stream read with `seek` close stream during may
cause performance bottleneck )
> Stream read with `seek` close stream during reverse scan may cause
> performance bottleneck
> ------------------------------------------------------------------------------------------
>
> Key: HDDS-15422
> URL: https://issues.apache.org/jira/browse/HDDS-15422
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Client
> Affects Versions: 2.2.0
> Reporter: Tak-Lon (Stephen) Wu
> Priority: Major
>
> While testing HBase read only with Ozone by using YCSB workload C (L1
> blockcache / no bucketcache /), We are experiencing degraded read throughput
> and hanging handler threads on our HBase RegionServers. Despite a healthy
> BlockCache hit rate (~87.5%), active read threads are becoming blocked,
> creating a bottleneck.
> Profiling reveals that handler threads are hanging IdLock.getLockEntry while
> waiting for a single thread to fetch an HFile block from Apache Ozone.
> and while tuning on TRACE/DEBUG log , we found the input stream are closing
> immediately after seeking from end/middle of the file back to 0. the next
> seek that tried to seek from 0 to middle hangs and wait till timeout. this
> indicates that read amplification issue triggered by Reverse Scans over the
> Ozone gRPC client cannot reuse the stream.
> {code}
> 2026-05-29 12:50:06,171 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream119-reader3831: return response positon 179405298, length 4095502
> (block offset 179306496, length 4194304)
> 2026-05-29 12:50:06,171 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream119: seek
> 179471034 -> 0
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
> stream119-reader3831
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream119-reader3831 setCompleted success
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream120: seek
> 0 -> 133955559
> 2026-05-29 12:50:06,172 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [grpc-default-executor-11]: stream119-reader3831 Failed to setCompleted:
> Already completed
> 2026-05-29 12:50:06,985 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek
> 217274129 -> 0
> 2026-05-29 12:50:06,985 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek
> 0 -> 102738300
> 2026-05-29 12:50:06,985 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position
> 102738300, length 65686, requested 102738300, diff 65686, readLength 65686,
> preReadSize=0
> 2026-05-29 12:50:07,008 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [grpc-default-executor-24]: stream492-reader3832: enqueue response offset
> 101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream492-reader3832: return response positon 102738300, length 3167876
> (block offset 101711872, length 4194304)
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek
> 102803986 -> 0
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
> stream492-reader3832
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
> stream492-reader3832 setCompleted success
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek
> 0 -> 217274129
> 2026-05-29 12:50:07,009 DEBUG
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
> [grpc-default-executor-24]: stream492-reader3832 Failed to setCompleted:
> Already completed
> {code}
> please check [this snippet of code
> |https://github.com/apache/ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java#L212]
> in StreamBlockInputStream.java.
> one thought is to reduce the read chunk size from e.g. 4MB or 1MB to 64KB to
> match the hbase block, but this size reduction does not help the fact of
> Reverse Scan for HFile reading cases.
> Setup configuration
> {code}
> ozone.fs.datastream.enabled = true
> ozone.client.stream.readblock.enable = true
> hdds.container.ratis.datastream.enabled = true
> ozone.client.verify.checksum = false (was true and turned to false when
> checksum failed)
> hdds.grpc.tls.enabled = false (final)
> ozone.client.stream.read.timeout = 120s
> scm.container.client.idle.threshold = 120s
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]