[ 
https://issues.apache.org/jira/browse/HDFS-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619456#comment-16619456
 ] 

Xiao Chen edited comment on HDFS-13926 at 9/18/18 7:23 PM:
-----------------------------------------------------------

v01 ready for review. Remaining checkstyle warnings are more being consistent 
with existing code.

As described, thread local stats doesn't work due to EC's multi-threaded reads. 
Below is how the read path looks like for regular and ec files:
{noformat}
DFSInputStream:
          at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:703)
          - locked <0x1986> (a org.apache.hadoop.hdfs.DFSInputStream)
          at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:764)
          at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:821)
          at java.io.DataInputStream.read(DataInputStream.java:149)
{noformat}

{noformat}
EC:
'main' read thread which creates the callables and submits them for async 
execution.
"Thread-0@996" prio=5 tid=0xd nid=NA runnable
  java.lang.Thread.State: RUNNABLE
          at 
org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:306)
          at 
org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:324)
          at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:324)
          at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:397)
          - locked <0x19ce> (a org.apache.hadoop.hdfs.DFSStripedInputStream)
          at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:821)

on each callable thread.
"StripedRead-0@6606" daemon prio=5 tid=0x363 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
          at 
org.apache.hadoop.hdfs.ByteBufferStrategy.readFromBlock(ReaderStrategy.java:187)
          at 
org.apache.hadoop.hdfs.ByteBufferStrategy.readFromBlock(ReaderStrategy.java:181)
          at 
org.apache.hadoop.hdfs.StripeReader.readToBuffer(StripeReader.java:238)
          at 
org.apache.hadoop.hdfs.StripeReader.lambda$readCells$0(StripeReader.java:279)
          at 
org.apache.hadoop.hdfs.StripeReader$$Lambda$35.1430315760.call(Unknown 
Source:-1)
          at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
          at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
          at 
java.util.concurrent.Executors$RunnableAdapter.call$$$capture(Executors.java:511)
          at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:-1)
          at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
          at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
          at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)
{noformat}

Currently the stats aggregation is inside {{ReaderStrategy}}. For ec to work, I 
pushed them down the stack into the streams. This should still be accurate but 
just having a slight yet benign gap (in terms of returning from the 
{{ReaderStrategy}} to the stream)) on updating the stats with the actual read. 


was (Author: xiaochen):
v01 ready for review. Remaining checkstyle warnings are more being consistent 
with existing code.

(Will update later today with a more detailed explanation of the fix)

> ThreadLocal aggregations for FileSystem.Statistics are incorrect with striped 
> reads
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-13926
>                 URL: https://issues.apache.org/jira/browse/HDFS-13926
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>    Affects Versions: 3.0.0
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>            Priority: Major
>         Attachments: HDFS-13926.01.patch, HDFS-13926.prelim.patch
>
>
> During some integration testing, [~nsheth] found out that per-thread read 
> stats for EC is incorrect. This is due to the striped reads are done 
> asynchronously on the worker threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to