[
https://issues.apache.org/jira/browse/AMBARI-22721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jayush Luniya updated AMBARI-22721:
-----------------------------------
Fix Version/s: (was: 2.7.1)
3.0.0
> Centralize the Management of Tarball Uploading
> ----------------------------------------------
>
> Key: AMBARI-22721
> URL: https://issues.apache.org/jira/browse/AMBARI-22721
> Project: Ambari
> Issue Type: Task
> Affects Versions: 2.6.2
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Priority: Critical
> Fix For: 3.0.0
>
>
> Ambari is required to upload tarballs into HDFS for many of the services to
> correctly function after they are installed. This tarball management is not
> centralized in any way, and is instead, spread out between several different
> Python files for various services:
> Hive uploads Tez, MapReduce2, Sqoop, etc tarballs
> Yarn does Tez, Slider, MapReduce2
> This causes a problem when patching a specific service, such as Sqoop. Sqoop
> requires that sqoop.tar.gz and mapreduce.tar.gz are available in the same
> versioned folder in HDFS. However, no Sqoop components perform this upload -
> Hive does. So, if Hive is not upgrading, these tarballs are never uploaded.
> The proposal here is to remove the coupling of tarball uploads and to manage
> these relationships on the stack:
> {code}
> {
> "tarball": {
> "MAPREDUCE2": {
> "JOB_HISTORY_SERVER": [
> {
> "tarball": "mapreduce.tar.gz",
> "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
> "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
> }
> ]
> },
> "HIVE": {
> "HIVE_SERVER2": [
> {
> "tarball": "mapreduce.tar.gz",
> "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
> "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
> },
> {
> "tarball": "sqoop.tar.gz",
> "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
> "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
> }
> ]
> },
> "SQOOP": {
> "SQOOP": [
> {
> "tarball": "mapreduce.tar.gz",
> "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
> "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
> },
> {
> "tarball": "sqoop.tar.gz",
> "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
> "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
> }
> ]
> }
> }
> }
> {code}
> - after-INSTALL hooks will check for {{CLIENT}} as the component category
> - after-START hooks will check for NOT {{CLIENT}}
> Additionally, using the file length for a checksum may no longer be
> sufficient. We should also add a checksum file to HDFS for each tarball so we
> can easily tell if work needs to be done (during an install, restart,
> upgrade, etc) to upload a new tarball (one that is also potentially modified
> with native libraries):
> {code:title=ambari-tarball-checksum.json (0644)}
> {
> "mapreduce.tar.gz": {
> "native_libraries": true,
> "file_count": 509
> },
> "hadoop-streaming.tar.gz": {
> "native_libraries": false,
> "file_count": 10
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)