[ 
https://issues.apache.org/jira/browse/KUDU-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Greber updated KUDU-3464:
--------------------------------
    Description: 
*Description:*
While working on adding extra startup flag support in the Python test infra I 
was tinkering with adding negative tests. One example, when a wrong flag name 
is specified in a test class.
{code:python}
@master_extra_startup_flags(['--non_existent_master_flag=1'])
@tserver_extra_startup_flags(extra_tserver_flags)
class TestKuduTestBaseStartupNonExistentMasterFlag(KuduTestBase, 
CompatUnitTest):
    @classmethod
    def setUpClass(self):
        error_msg = 'RUNTIME_ERROR'
        with self.assertRaisesRegex(self, Exception, error_msg):
            super(TestKuduTestBaseStartupNonExistentMasterFlag, self)\
                .setUpClass()    def 
test_startup_non_existent_master_flag(self):
        pass

@master_extra_startup_flags(extra_master_flags)
@tserver_extra_startup_flags(['--non_existent_tserver_flag=1'])
class TestKuduTestBaseStartupNonExistentTserverFlag(KuduTestBase, 
CompatUnitTest):
    @classmethod
    def setUpClass(self):
        error_msg = 'RUNTIME_ERROR'
        with self.assertRaisesRegex(self, Exception, error_msg):
            super(TestKuduTestBaseStartupNonExistentTserverFlag, self)\
                .setUpClass()    def 
test_startup_non_existent_tserver_flag(self):
        pass {code}
By themselves, these test run fine. Running them after each other results in 
the following error:
{code:bash}
2023-04-03T06:25:58Z Fatal error : Another chronyd may already be running 
(pid=225263), check /tmp/kudutest-0/minicluster-data/chrony.0/chronyd.pid
Could not open connection to daemon
Could not open connection to daemon {code}
It times out with the above error, when the control flow reaches the second 
test.

I suspect that when the first cluster creation fails, chronyd is not properly 
disposed of.

(After quitting the test execution, the referred /tmp/kudutest-0 location is 
properly cleaned up.)

*Consequences:*

If developers writing Python tests mess up a flag name, or value for more than 
once in the code they get "Could not open connection to daemon" errors. (which 
is not really helpful at first)

However for properly written test code this bug has no negative effect.

*Todo:*

Investigate what happens inside external mini-cluster in case the cluster 
creation fails, because of for example wrong flag.

  was:
*Description:*
While working on adding extra startup flag support in the Python test infra I 
was tinkering with adding negative tests. One example, when a wrong flag name 
is specified in a test class.
{code:python}
class TestKuduTestStartupFlagsMasterWrongFlagName(KuduTestBase, CompatUnitTest):
    @classmethod
    def setUpClass(self):
        extra_master_flags=[("non_existent_flag","1")]
        extra_tserver_flags=[("tablet_apply_pool_overload_threshold_ms", "1")]
        error_msg = 'RUNTIME_ERROR'
        with self.assertRaisesRegex(self, Exception, error_msg):
            super(TestKuduTestStartupFlagsMasterWrongFlagName, self)\
                .setUpClass(extra_master_flags, extra_tserver_flags)

    def test_startup_flags_master_wrong_flag_name(self):
        pass

class TestKuduTestStartupFlagsTserverWrongFlagName(KuduTestBase, 
CompatUnitTest):
    @classmethod
    def setUpClass(self):
        extra_master_flags=[("check_expired_table_interval_seconds","1")]
        extra_tserver_flags=[("non_existent_flag","1")]
        error_msg = 'RUNTIME_ERROR'
        with self.assertRaisesRegex(self, Exception, error_msg):
            super(TestKuduTestStartupFlagsTserverWrongFlagName, self)\
                .setUpClass(extra_master_flags, extra_tserver_flags)

    def test_startup_flags_tserver_wrong_flag_name(self):
        pass
{code}
By themselves, these test run fine. Running them after each other results in 
the following error:
{code:bash}
2023-03-22T14:48:01Z Fatal error : Another chronyd may already be running 
(pid=377244), check /tmp/kudutest-0/minicluster-data/chrony.0/chronyd.pid
Could not open connection to daemon
{code}
It times out with the above error, when the control flow reaches the second 
test:
{code:python}
_ ERROR at setup of 
TestKuduTestStartupFlagsTserverWrongFlagName.test_startup_flags_tserver_wrong_flag_name
 _
Exception: Error in response: {'code': 'TIMED_OUT', 'message': 'failed to start 
NTP server 0: failed to contact chronyd in 1.000s'}

During handling of the above exception, another exception occurred:

self = <class 
'kudu.tests.test_common.TestKuduTestStartupFlagsTserverWrongFlagName'>

    @classmethod
    def setUpClass(self):
        extra_master_flags=[("check_expired_table_interval_seconds","1")]
        extra_tserver_flags=[("non_existent_flag","1")]
        error_msg = 'RUNTIME_ERROR'
        with self.assertRaisesRegex(self, Exception, error_msg):
            super(TestKuduTestStartupFlagsTserverWrongFlagName, self)\
>               .setUpClass(extra_master_flags, extra_tserver_flags)
E           TypeError: _formatMessage() missing 1 required positional argument: 
'standardMsg'

kudu/tests/test_common.py:82: TypeError
{code}
I suspect that when the first cluster creation fails, chronyd is not properly 
disposed of.

(After quitting the test execution, the referred /tmp/kudutest-0 location is 
properly cleaned up.)

*Consequences:*

If developers writing Python tests mess up a flag name, or value for more than 
once in the code they get "Could not open connection to daemon" errors. (which 
is not really helpful at first)

However for properly written test code this bug has no negative effect.


> Failed mini-cluster creation leaves chronyd open in Python test infra
> ---------------------------------------------------------------------
>
>                 Key: KUDU-3464
>                 URL: https://issues.apache.org/jira/browse/KUDU-3464
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Marton Greber
>            Priority: Minor
>              Labels: client, python
>
> *Description:*
> While working on adding extra startup flag support in the Python test infra I 
> was tinkering with adding negative tests. One example, when a wrong flag name 
> is specified in a test class.
> {code:python}
> @master_extra_startup_flags(['--non_existent_master_flag=1'])
> @tserver_extra_startup_flags(extra_tserver_flags)
> class TestKuduTestBaseStartupNonExistentMasterFlag(KuduTestBase, 
> CompatUnitTest):
>     @classmethod
>     def setUpClass(self):
>         error_msg = 'RUNTIME_ERROR'
>         with self.assertRaisesRegex(self, Exception, error_msg):
>             super(TestKuduTestBaseStartupNonExistentMasterFlag, self)\
>                 .setUpClass()    def 
> test_startup_non_existent_master_flag(self):
>         pass
> @master_extra_startup_flags(extra_master_flags)
> @tserver_extra_startup_flags(['--non_existent_tserver_flag=1'])
> class TestKuduTestBaseStartupNonExistentTserverFlag(KuduTestBase, 
> CompatUnitTest):
>     @classmethod
>     def setUpClass(self):
>         error_msg = 'RUNTIME_ERROR'
>         with self.assertRaisesRegex(self, Exception, error_msg):
>             super(TestKuduTestBaseStartupNonExistentTserverFlag, self)\
>                 .setUpClass()    def 
> test_startup_non_existent_tserver_flag(self):
>         pass {code}
> By themselves, these test run fine. Running them after each other results in 
> the following error:
> {code:bash}
> 2023-04-03T06:25:58Z Fatal error : Another chronyd may already be running 
> (pid=225263), check /tmp/kudutest-0/minicluster-data/chrony.0/chronyd.pid
> Could not open connection to daemon
> Could not open connection to daemon {code}
> It times out with the above error, when the control flow reaches the second 
> test.
> I suspect that when the first cluster creation fails, chronyd is not properly 
> disposed of.
> (After quitting the test execution, the referred /tmp/kudutest-0 location is 
> properly cleaned up.)
> *Consequences:*
> If developers writing Python tests mess up a flag name, or value for more 
> than once in the code they get "Could not open connection to daemon" errors. 
> (which is not really helpful at first)
> However for properly written test code this bug has no negative effect.
> *Todo:*
> Investigate what happens inside external mini-cluster in case the cluster 
> creation fails, because of for example wrong flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to