----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40600/#review107598 -----------------------------------------------------------
Ship it! Ship It! - Dmitro Lisnichenko On Nov. 23, 2015, 5:23 p.m., Andrew Onischuk wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40600/ > ----------------------------------------------------------- > > (Updated Nov. 23, 2015, 5:23 p.m.) > > > Review request for Ambari and Dmitro Lisnichenko. > > > Bugs: AMBARI-14017 > https://issues.apache.org/jira/browse/AMBARI-14017 > > > Repository: ambari > > > Description > ------- > > PROBLEM > User runs "apt-get check" via > a cron job on their servers to check for broken dependencies. They report this > command may take up to two minutes to complete on various nodes in their > cluster. This command locks the package database via a write lock on > /var/lib/dpkg/lock. During that interval, if Ambari is commanded to install a > new component or perform other maintenance tasks on a cluster node that > require access to the package database, the command will fail. Since the apt- > get check is cron, apparently with some frequency, this represents a problem > for ongoing maintenance, especially in large clusters. > > It would be desirable if ambari and/or the agent were more fault tolerant of > locks on the package database. > > The stack trace at failure follows > Traceback (most recent call last): > File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before- > INSTALL/scripts/hook.py", line 37, in <module> > BeforeInstallHook().execute() > File "/usr/lib/python2.6/site- > packages/resource_management/libraries/script/script.py", line 219, in > execute > method(env) > File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before- > INSTALL/scripts/hook.py", line 33, in hook > install_repos() > File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before- > INSTALL/scripts/repo_initialization.py", line 59, in install_repos > _alter_repo("create", params.repo_info, template) > File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before- > INSTALL/scripts/repo_initialization.py", line 50, in _alter_repo > components = ubuntu_components, # ubuntu specific > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line > 154, in __init__ > self.env.run() > File "/usr/lib/python2.6/site- > packages/resource_management/core/environment.py", line 152, in run > self.run_action(resource, action) > File "/usr/lib/python2.6/site- > packages/resource_management/core/environment.py", line 118, in run_action > provider_action() > File "/usr/lib/python2.6/site- > packages/resource_management/libraries/providers/repository.py", line 110, in > action_create > retcode, out = checked_call(update_cmd_formatted, sudo=True, quiet=False) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 70, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 92, in checked_call > tries=tries, try_sleep=try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 140, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 291, in _call > raise Fail(err_msg) > resource_management.core.exceptions.Fail: Execution of 'apt-get update <del>qq > -o Dir::Etc::sourcelist=sources.list.d/HDP.list -o > Dir::Etc::sourceparts=</del> -o APT::Get::List-Cleanup=0' returned 100. W: GPG > error: <http://public-repo-1.hortonworks.com> HDP InRelease: The following > signatures couldn't be verified because the public key is not available: > NO_PUBKEY B9733A7A07513CAD > E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily > unavailable) > E: Unable to lock the administration directory (/var/lib/dpkg/), is another > process using it? > > IMPACT > User will not manage their cluster with Ambari if this cannot be > fixed by the end of November. > > EXPECTED > Ambari retries installations for some period of time > > ACTUAL > Ambari fails > > ANALYSIS > I created a simple program based on the code at > <http://beej.us/guide/bgipc/output/html/multipage/flocking.html> to write lock > /var/lib/dpkg/lock on command, and then attempted a component install on a new > node in a cluster. The install failed. After removing the lock, the > installation succeeded. This is easily reproduced using a simple C program on > a target node. > > > Diffs > ----- > > ambari-agent/src/test/python/resource_management/TestPackageResource.py > 18b2d00 > > ambari-common/src/main/python/resource_management/core/providers/package/__init__.py > 7e532bc > > ambari-common/src/main/python/resource_management/core/providers/package/apt.py > ddd6952 > > ambari-common/src/main/python/resource_management/core/providers/package/zypper.py > 3ff3dfd > > ambari-common/src/main/python/resource_management/core/resources/packaging.py > 1ca88af > > Diff: https://reviews.apache.org/r/40600/diff/ > > > Testing > ------- > > mvn clean test > > > Thanks, > > Andrew Onischuk > >
