I have some questions inline.  over all this is really good.
Thank you, Mike.

The test is neat, and I'm sure it was useful for you in fixing
this and recreating.  I'm a bit worried about it though because
a.) its not going to run for most people or in c-i, making it
prone to bit-rot.
b.) I think the test can fail in multiple ways
  1. false pass, where the retry code is never hit.  Just lucky.
  2. false fail, where the retry code just didnt retry enough.

I'm grateful for the test, I'm just not sure that it fits as a unit
test in the current form.

I'll dig on the pyserial, and I'll have the c-i bot run on your branch.



Diff comments:

> diff --git a/cloudinit/sources/DataSourceSmartOS.py 
> b/cloudinit/sources/DataSourceSmartOS.py
> index 86bfa5d..24187d6 100644
> --- a/cloudinit/sources/DataSourceSmartOS.py
> +++ b/cloudinit/sources/DataSourceSmartOS.py
> @@ -395,12 +409,21 @@ class JoyentMetadataClient(object):
>          return value
>  
>      def get(self, key, default=None, strip=False):
> -        result = self.request(rtype='GET', param=key)
> -        if result is None:
> -            return default
> -        if result and strip:
> -            result = result.strip()
> -        return result
> +        # Do a couple tries in case someone else has the serial port open

This is in the generic JoyentMetadataClient, could/should the retry bit be 
moved to the Serial client?  it could just retry over super().get

in your comment, please mention that this is only releveant for some process 
that was not  participating in the cooperative locking.  As I was reading it, I 
thought "Don't we have a lock on this?"

> +        # before this process opened it.  This also helps in the event that
> +        # the metadata server goes away in middle of a conversation.
> +        for tries in [1, 2]:

doesnt 2 just seem like a small number of retries?

> +            try:
> +                result = self.request(rtype='GET', param=key)
> +                if result is None:
> +                    return default
> +                if result and strip:
> +                    result = result.strip()
> +                return result
> +            except JoyentMetadataFetchException as exc:
> +                last_exc = exc
> +                pass

'pass' not necessary here.

> +        raise(last_exc)
>  
>      def get_json(self, key, default=None):
>          result = self.get(key, default=default)
> @@ -471,6 +494,7 @@ class JoyentMetadataSerialClient(JoyentMetadataClient):
>          ser = serial.Serial(self.device, timeout=self.timeout)
>          if not ser.isOpen():
>              raise SystemError("Unable to open %s" % self.device)
> +        fcntl.lockf(ser, fcntl.LOCK_EX)

shouldnt we fnctl.lockf(ser, fnctl.LOCK_UN) somewhere ? or does that magically 
get done on close?

>          self.fp = ser
>  
>      def __repr__(self):
> diff --git a/packages/bddeb b/packages/bddeb
> index 4f2e2dd..7e96130 100755
> --- a/packages/bddeb
> +++ b/packages/bddeb
> @@ -68,7 +68,8 @@ def write_debian_folder(root, templ_data, is_python2, 
> cloud_util_deps):
>      test_reqs = run_helper(
>          'read-dependencies',
>          ['--requirements-file', 'test-requirements.txt',
> -         '--system-pkg-names', '--python-version', pyver]).splitlines()
> +         '--system-pkg-names', '--python-version', pyver,
> +         '-d', 'debian']).splitlines()

use '--distro=debian', i just find that more self documenting.
good cache on this.

>  
>      requires = ['cloud-utils | cloud-guest-utils'] if cloud_util_deps else []
>      # We consolidate all deps as Build-Depends as our package build runs all
> diff --git a/test-requirements.txt b/test-requirements.txt
> index d9d41b5..548132c 100644
> --- a/test-requirements.txt
> +++ b/test-requirements.txt
> @@ -11,3 +11,5 @@ coverage
>  # Only really needed on older versions of python
>  contextlib2
>  setuptools
> +
> +pyserial

there is some reason pyserial is not is not in this list.
i know its wierd on redhat where i think there is not even a python3 version 
availble.

> diff --git a/tests/unittests/test_datasource/test_smartos.py 
> b/tests/unittests/test_datasource/test_smartos.py
> index 88bae5f..4d9f9ab 100644
> --- a/tests/unittests/test_datasource/test_smartos.py
> +++ b/tests/unittests/test_datasource/test_smartos.py
> @@ -872,4 +877,67 @@ class TestNetworkConversion(TestCase):
>          found = convert_net(SDC_NICS_SINGLE_GATEWAY)
>          self.assertEqual(expected, found)
>  
> +
> +@unittest.skipUnless(DataSourceSmartOS.get_smartos_environ() ==
> +                     DataSourceSmartOS.SMARTOS_ENV_KVM,
> +                     "Only supported on KVM and bhyve guests under SmartOS")
> +@unittest.skipUnless(os.access(DataSourceSmartOS.SERIAL_DEVICE, os.W_OK),
> +                     "Requires write access to " +
> +                     DataSourceSmartOS.SERIAL_DEVICE)
> +class TestSerialConcurrency(TestCase):
> +    """
> +       This class tests locking on an actual serial port, and as such can 
> only
> +       be run in a kvm or bhyve guest running on a SmartOS host.  A test run 
> on
> +       a metadata socket will not be valid because a metadata socket ensures
> +       there is only one session over a connection.  In contrast, in the
> +       absence of proper locking multiple processes opening the same serial
> +       port can corrupt each others' exchanges with the metadata server.
> +    """

you can out-dent this stuff.
    """"
    This class tests locking ...
    """

> +    def setUp(self):
> +        self.mdata_proc = 
> multiprocessing.Process(target=self.start_mdata_loop)
> +        self.mdata_proc.start()
> +        super(TestSerialConcurrency, self).setUp()
> +
> +    def tearDown(self):
> +        # os.kill() rather than mdata_proc.terminate() to avoid console spam.
> +        os.kill(self.mdata_proc.pid, signal.SIGKILL)
> +        self.mdata_proc.join()
> +        super(TestSerialConcurrency, self).tearDown()
> +
> +    def start_mdata_loop(self):
> +        """
> +           The mdata-get command is repeatedly run in a separate process so
> +           that it may try to race with metadata operations performed in the
> +           main test process.  Use of mdata-get is better than two processes
> +           using the protocol implementation in DataSourceSmartOS because we
> +           are testing to be sure that cloud-init and mdata-get respect each
> +           others locks.

this can be out-dented too

> +        """
> +        while True:
> +            try:
> +                subprocess.check_output(['/usr/sbin/mdata-get', 
> 'sdc:routes'])
> +            except subprocess.CalledProcessError:
> +                pass
> +
> +    def test_all_keys(self):
> +        self.assertIsNotNone(self.mdata_proc.pid)
> +        ds = DataSourceSmartOS
> +        keys = [tup[0] for tup in ds.SMARTOS_ATTRIB_MAP.values()]
> +        keys.extend(ds.SMARTOS_ATTRIB_JSON.values())
> +
> +        client = ds.jmc_client_factory()
> +        self.assertIsNotNone(client)
> +
> +        # The behavior that we are testing for was observed mdata-get running
> +        # 10 times at roughly the same time as cloud-init fetched each key
> +        # once.  cloud-init would regularly see failures before making it
> +        # through all keys once.
> +        for it in range(0, 3):
> +            for key in keys:
> +                # We don't care about the return value, just that it doesn't
> +                # thrown any exceptions.
> +                client.get(key)
> +
> +        self.assertIsNone(self.mdata_proc.exitcode)
> +
>  # vi: ts=4 expandtab


-- 
https://code.launchpad.net/~mgerdts/cloud-init/+git/cloud-init/+merge/337271
Your team cloud-init commiters is requested to review the proposed merge of 
~mgerdts/cloud-init:bug-1746605 into cloud-init:master.

_______________________________________________
Mailing list: https://launchpad.net/~cloud-init-dev
Post to     : cloud-init-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~cloud-init-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to