[
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904077#comment-16904077
]
Eric Yang commented on HDDS-1554:
---------------------------------
[~arp] Thank you for the review.
{quote}ITDiskReadOnly#testReadOnlyDiskStartup - The following block of code can
probably be removed, since it's really testing that the cluster is read-only in
safe mode. We have unit tests for that:
{quote}
Correct me, if I am wrong. The tests are not exactly the same. This test is
triggering validation from Ozone client point of view. The unit test for
TestVolumeSet#testFailedVolume is written for the server side. The smoke test
tests the positive test case to ensure volume can be created, but not when disk
is in read-only mode. I think there is value in test client side response to
ensure we have better coverage. Thought?
{quote}ITDiskReadOnly#testUpload - do we need to wait for safe mode exit after
restarting the cluster? Also I think this test is essentially the same as the
previous one.
{quote}
Safe mode validation is skipped here because Ozone exits on read-only disk. The
extra wait time only adds formality for wait time. In reality, it would be
better to keep Ozone daemon running, but keep the file system in safe mode or
degraded mode that prevents write operations. This would be useful for disaster
recovery that System admin may want to prevent further damage to disk but
intend to recover data from Ozone buckets. This test is designed to pass for
running in read-only mode, and exit strategy mode. Both design are validate.
Test is more useful, if Ozone daemons don't exit on read-only disk. I intend
to add a download test for ITDiskReadOnly as well, if read-only mode can be
implemented.
{quote}ITDiskCorruption#addCorruption:72 - looks like we have a hard-coded
path. Should we get from configuration instead?
{quote}
Thank you for the suggestion. I made adjustment to ensure maven project build
directory can be customized in patch 014. The test is using
${buildDirectory}/data/meta to store metadata, which defaults to maven
${project.build.directory}. It will corrupt the data file. Placing the data
file in maven build directory is a good way to ensure that mvn clean will reset
the state of the data file cleanly. When this is configured externally, then
external mechanism must be developed to reset the data file state.
{quote}ITDiskCorruption#testUpload - The corruption implementation is bit of a
heavy hammer, it is replacing the content of all meta files. Is it possible to
make it reflect real-world corruption where a part of the file may be
corrupted. Also we should probably restart the cluster after corrupting RocksDB
meta files.
{quote}
If Ozone is restarted after metadata corruption, it will fall into the same
code path that unable to open rocksdb and fail to start. This will make
corruption upload test to execution the same code path as
ITDiskReadOnly#testReadOnlyDiskStartupp. The test would have no purpose. The
test is purposefully corrupting metadata files without restart. This is to
ensure safety mechanism will be built to protect metadata integrity. One
possible design is to have background thread that check for rocksdb health. In
the test, we can shorten the interval of the check to almost immediate, to
verify that upload would not be successful when metadata corruption happens,
and Ozone protect further corruption by entering safe mode or degraded mode.
{quote}ITDiskCorruption#testDownload:161 - should we just remove the assertTrue
since it is no-op?
{quote}
The intend is to ensure IOException is throw for the test assertion to pass. It
is better written for clarity:
{code:java}
Assert.assertTrue("Download File test passed.", e instanceof IOException);
{code}
Patch 014 also includes the improved assertTrue statements.
> Create disk tests for fault injection test
> ------------------------------------------
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: build
> Reporter: Eric Yang
> Assignee: Eric Yang
> Priority: Major
> Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch,
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch,
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch,
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch,
> HDDS-1554.012.patch, HDDS-1554.013.patch
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
> # Scenario 1 - Read/Write test
> ## Run docker-compose to bring up a cluster
> ## Initialize scm and om
> ## Upload data to Ozone cluster
> ## Verify data is correct
> ## Shutdown cluster
> # Scenario 2 - Read/Only test
> ## Repeat Scenario 1
> ## Mount data disk as read only
> ## Try to write data to Ozone cluster
> ## Validate error message is correct
> ## Shutdown cluster
> # Scenario 3 - Corruption test
> ## Repeat Scenario 2
> ## Shutdown cluster
> ## Modify data disk data
> ## Restart cluster
> ## Validate error message for read from corrupted data
> ## Validate error message for write to corrupted volume
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]