[
https://issues.apache.org/jira/browse/AMBARI-23594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440792#comment-16440792
]
Jonathan Hurley commented on AMBARI-23594:
------------------------------------------
The RCA for this one is that we are trying to install the wrong LZO packages
during NameNode 3.0 startup:
{code}
2018-04-13 18:21:47,999 - Package['lzo'] {'retry_on_repo_unavailability': True,
'retry_count': 5}
2018-04-13 18:21:48,121 - Skipping installation of existing package lzo
2018-04-13 18:21:48,121 - Package['hadooplzo_2_6_1_0_129']
{'retry_on_repo_unavailability': True, 'retry_count': 5}
2018-04-13 18:21:48,136 - Skipping installation of existing package
hadooplzo_2_6_1_0_129
2018-04-13 18:21:48,137 - Package['hadooplzo_2_6_1_0_129-native']
{'retry_on_repo_unavailability': True, 'retry_count': 5}
2018-04-13 18:21:48,151 - Skipping installation of existing package
hadooplzo_2_6_1_0_129-native
{code}
Because we're sending down the wrong version here, we never install
{{hadooplzo_3_0_0_0_1192.x86_64}}.
{code}
hadooplzo.noarch : hadooplzo Distro virtual package
hadooplzo-native.noarch : hadooplzo-native Distro virtual package
hadooplzo_2_6_1_0_129.x86_64 : Hadoop-LZO is a project to bring splittable LZO
compression to Hadoop
hadooplzo_2_6_1_0_129-native.x86_64 : GPL Compression Libraries for Hadoop
(native)
hadooplzo_3_0_0_0_1192.x86_64 : Hadoop-LZO is a project to bring splittable LZO
compression to Hadoop
hadooplzo_3_0_0_0_1192-native.x86_64 : GPL Compression Libraries for Hadoop
(native)
{code}
This is an Ambari issue with how we generate commands for the upgrade. When
creating commands, we are populating a {{RepositoryFile}} which captures the
current state of the component's repository (2.6.1.0-129). When this runs
during the upgrade, the wrong {{RepositoryFile}} is being sent down. This is
why the version for installing LZO is incorrect.
In general, we shouldn't need to pre-populate {{ExecutionCommand}}s with
{{RepositoryFile}} since it can usually be taken at command-runtime. However,
there is one case where this is not so:
- During a stack's distribution
The core logic in Ambari states that when a command is about to run, we should
populate the {{repositoryFile}} IFF it's not already set on the command. Most
times, this is benign, but in the case of upgrades, it's a real problem.
As I see it, we have a couple of options here:
- Remove it from all execution/action command helpers
-- Rely on the {{ExecutionCommandWrapper}} to set it before the command is sent
-- Require that stack distribution code to ensure that they place it on there
instead of having the helpers do it
- Have the {{UpgradeResourceProvider}} strip it from all commands that it
generates
> LZO Libraries Are Not Installed Correctly During Upgrade
> --------------------------------------------------------
>
> Key: AMBARI-23594
> URL: https://issues.apache.org/jira/browse/AMBARI-23594
> Project: Ambari
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Vivek Sharma
> Assignee: Jonathan Hurley
> Priority: Blocker
> Fix For: 2.7.0
>
>
> *STR*
> # Deployed cluster with Ambari version: 2.6.1.5-3 and HDP version: 2.6.1.0-129
> # Upgrade Ambari to Target Version: 2.7.0.0-312, then upgrade AMS
> # Delete unsupported services like Flume, Falcon
> # Try Express Upgrade to HDP-3.0.0.0-1192
>
> *Result*
> Oozie server start failed
> {code}
> Traceback (most recent call last):
> File
> "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/OOZIE/package/scripts/oozie_server.py",
> line 154, in <module>
> OozieServer().execute()
> File
> "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py",
> line 353, in execute
> method(env)
> File
> "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/OOZIE/package/scripts/oozie_server.py",
> line 70, in configure
> oozie(is_server=True)
> File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89,
> in thunk
> return fn(*args, **kwargs)
> File
> "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/OOZIE/package/scripts/oozie.py",
> line 196, in oozie
> Execute(format('{sudo} cp {hadoop_lib_home}/hadoop-lzo*.jar
> {oozie_lib_dir}'),
> File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line
> 166, in __init__
> self.env.run()
> File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py",
> line 160, in run
> self.run_action(resource, action)
> File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py",
> line 124, in run_action
> provider_action()
> File
> "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py",
> line 263, in action_run
> returns=self.resource.returns)
> File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line
> 72, in inner
> result = function(command, **kwargs)
> File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line
> 102, in checked_call
> tries=tries, try_sleep=try_sleep,
> timeout_kill_strategy=timeout_kill_strategy, returns=returns)
> File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line
> 150, in _call_wrapper
> result = _call(command, **kwargs_copy)
> File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line
> 308, in _call
> raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of
> 'ambari-sudo.sh cp /usr/hdp/3.0.0.0-1192/hadoop/lib/hadoop-lzo*.jar
> /usr/hdp/current/oozie-server' returned 1. cp: cannot stat
> '/usr/hdp/3.0.0.0-1192/hadoop/lib/hadoop-lzo*.jar': No such file or directory
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)