[
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867452#comment-16867452
]
Elek, Marton commented on HDDS-1554:
------------------------------------
Thanks Eric the answer. Can you please describe what is the partial
implementation? I can see a docker-compose file
(hadoop-ozone/fault-injection-test/disk-tests/read-only-test/) but can't see
any volume settings.
hadoop-ozone/fault-injection-test/disk-tests/read-only-test/src/test/resources/compose/docker-compose.yaml
{code:java}
+ scm:
+ image: ${user.name}/ozone:${project.version}
+ ports:
+ - 9860:9860
+ - 9861:9861
+ - 9863:9863
+ - 9876:9876
+ env_file:
+ - ./docker-config
+ environment:
+ ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
+ command: ["/opt/hadoop/bin/ozone","scm"]{code}
I think you used the basic /data directory which is writable:
hadoop-ozone/fault-injection-test/disk-tests/read-only-test/src/test/resources/compose/docker-config
{code:java}
+OZONE-SITE.XML_ozone.metadata.dirs=/data/metadata{code}
But you tries to check the safe mode:
hadoop-ozone/fault-injection-test/disk-tests/read-only-test/src/test/java/org/apache/hadoop/ozone/ITDiskReadOnly.java
{code:java}
+ @Test
+ public void testWaitForSafeMode() throws InterruptedException {
+ LOG.info("Wait for cluster to exit safe mode...");
+ int retries = 1;
+ boolean safeMode = true;
+ ScmClient scmClient;
+ while (safeMode && retries <= MAX_RETRIES) {
+ try {
+ LOG.info("Connection attempt {} of 30.", retries);
+ scmClient = new SCMCLI().createScmClient();
+ safeMode = scmClient.inSafeMode();
+ } catch (Exception e) {
+ safeMode = true;
+ }
+ retries++;
+ Thread.sleep(2000);
+ }
+ Assert.assertFalse("Ozone cluster should not exit safe mode.", safeMode);
+ }{code}
This code tries to check the safe mode. Actually we are not interested about
the safe mode here, as SCM can't be started (or shouldn't be started) with read
only directory.
The other problem with this code fragment that you assume that the safe mode is
true in case of any exception. In case of any exception you wait 60 seconds in
the tests without checking what is exactly the problem.
{quote}Some test case can be refined when error detection is better implemented
later. Does this work for you?
{quote}
I think it's better to commit working tests one by one. Let's focus on the
corruption-test, for now. As you requested I created a PR to show how is it
possible to test it with the existing tools. (With a more simple way).
Feel free to check it:
https://github.com/apache/hadoop/pull/990
> Create disk tests for fault injection test
> ------------------------------------------
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: build
> Reporter: Eric Yang
> Assignee: Eric Yang
> Priority: Major
> Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch,
> HDDS-1554.003.patch, HDDS-1554.004.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
> # Scenario 1 - Read/Write test
> ## Run docker-compose to bring up a cluster
> ## Initialize scm and om
> ## Upload data to Ozone cluster
> ## Verify data is correct
> ## Shutdown cluster
> # Scenario 2 - Read/Only test
> ## Repeat Scenario 1
> ## Mount data disk as read only
> ## Try to write data to Ozone cluster
> ## Validate error message is correct
> ## Shutdown cluster
> # Scenario 3 - Corruption test
> ## Repeat Scenario 2
> ## Shutdown cluster
> ## Modify data disk data
> ## Restart cluster
> ## Validate error message for read from corrupted data
> ## Validate error message for write to corrupted volume
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]