[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

Eric Yang (JIRA) Fri, 09 Aug 2019 10:51:17 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904077#comment-16904077
 ]


Eric Yang commented on HDDS-1554:
---------------------------------

[~arp] Thank you for the review.
{quote}ITDiskReadOnly#testReadOnlyDiskStartup - The following block of code can 
probably be removed, since it's really testing that the cluster is read-only in 
safe mode. We have unit tests for that:
{quote}
Correct me, if I am wrong. The tests are not exactly the same. This test is 
triggering validation from Ozone client point of view. The unit test for 
TestVolumeSet#testFailedVolume is written for the server side. The smoke test 
tests the positive test case to ensure volume can be created, but not when disk 
is in read-only mode. I think there is value in test client side response to 
ensure we have better coverage. Thought?
{quote}ITDiskReadOnly#testUpload - do we need to wait for safe mode exit after 
restarting the cluster? Also I think this test is essentially the same as the 
previous one.
{quote}
Safe mode validation is skipped here because Ozone exits on read-only disk. The 
extra wait time only adds formality for wait time. In reality, it would be 
better to keep Ozone daemon running, but keep the file system in safe mode or 
degraded mode that prevents write operations. This would be useful for disaster 
recovery that System admin may want to prevent further damage to disk but 
intend to recover data from Ozone buckets. This test is designed to pass for 
running in read-only mode, and exit strategy mode. Both design are validate.  
Test is more useful, if Ozone daemons don't exit on read-only disk.  I intend 
to add a download test for ITDiskReadOnly as well, if read-only mode can be 
implemented.
{quote}ITDiskCorruption#addCorruption:72 - looks like we have a hard-coded 
path. Should we get from configuration instead?
{quote}
Thank you for the suggestion.  I made adjustment to ensure maven project build 
directory can be customized in patch 014.  The test is using 
${buildDirectory}/data/meta to store metadata, which defaults to maven 
${project.build.directory}. It will corrupt the data file. Placing the data 
file in maven build directory is a good way to ensure that mvn clean will reset 
the state of the data file cleanly. When this is configured externally, then 
external mechanism must be developed to reset the data file state.
{quote}ITDiskCorruption#testUpload - The corruption implementation is bit of a 
heavy hammer, it is replacing the content of all meta files. Is it possible to 
make it reflect real-world corruption where a part of the file may be 
corrupted. Also we should probably restart the cluster after corrupting RocksDB 
meta files.
{quote}
If Ozone is restarted after metadata corruption, it will fall into the same 
code path that unable to open rocksdb and fail to start. This will make 
corruption upload test to execution the same code path as 
ITDiskReadOnly#testReadOnlyDiskStartupp. The test would have no purpose. The 
test is purposefully corrupting metadata files without restart. This is to 
ensure safety mechanism will be built to protect metadata integrity. One 
possible design is to have background thread that check for rocksdb health. In 
the test, we can shorten the interval of the check to almost immediate, to 
verify that upload would not be successful when metadata corruption happens, 
and Ozone protect further corruption by entering safe mode or degraded mode.
{quote}ITDiskCorruption#testDownload:161 - should we just remove the assertTrue 
since it is no-op?
{quote}
The intend is to ensure IOException is throw for the test assertion to pass. It 
is better written for clarity:
{code:java}
Assert.assertTrue("Download File test passed.", e instanceof IOException);
{code}

Patch 014 also includes the improved assertTrue statements.

> Create disk tests for fault injection test
> ------------------------------------------
>
>                 Key: HDDS-1554
>                 URL: https://issues.apache.org/jira/browse/HDDS-1554
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: build
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch, 
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch, 
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch, 
> HDDS-1554.012.patch, HDDS-1554.013.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - Read/Only test
>  ## Repeat Scenario 1
>  ## Mount data disk as read only
>  ## Try to write data to Ozone cluster
>  ## Validate error message is correct
>  ## Shutdown cluster
>  # Scenario 3 - Corruption test
>  ## Repeat Scenario 2
>  ## Shutdown cluster
>  ## Modify data disk data
>  ## Restart cluster
>  ## Validate error message for read from corrupted data
>  ## Validate error message for write to corrupted volume



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

Reply via email to