[
https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272695#comment-15272695
]
Xiaowei Zhu commented on HDFS-9890:
-----------------------------------
The latest patch HDFS-9890.HDFS-8707.006.patch fixed a bug in hdfs.cc that file
event callback with not be set and passed down to block reader properly. Also
for test_libhdfs_mini_stress.c changed the behavior of RANDOM_ERROR_RATIO:
1. unset: use default 1000000000
2. set to 0: always error
3. <0: always pass
4. other cases: random() % RANDOM_ERROR_RATIO
> libhdfs++: Add test suite to simulate network issues
> ----------------------------------------------------
>
> Key: HDFS-9890
> URL: https://issues.apache.org/jira/browse/HDFS-9890
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: James Clampffer
> Assignee: Xiaowei Zhu
> Attachments: HDFS-9890.HDFS-8707.000.patch,
> HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch,
> HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch,
> HDFS-9890.HDFS-8707.005.patch, HDFS-9890.HDFS-8707.006.patch
>
>
> I propose adding a test suite to simulate various network issues/failures in
> order to get good test coverage on some of the retry paths that aren't easy
> to hit in mock unit tests.
> At the moment the only things that hit the retry paths are the gmock unit
> tests. The gmock are only as good as their mock implementations which do a
> great job of simulating protocol correctness but not more complex
> interactions. They also can't really simulate the types of lock contention
> and subtle memory stomps that show up while doing hundreds or thousands of
> concurrent reads. We should add a new minidfscluster test that focuses on
> heavy read/seek load and then randomly convert error codes returned by
> network functions into errors.
> List of things to simulate(while heavily loaded), roughly in order of how
> badly I think they need to be tested at the moment:
> -Rpc connection disconnect
> -Rpc connection slowed down enough to cause a timeout and trigger retry
> -DN connection disconnect
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]