[ 
https://issues.apache.org/jira/browse/AMBARI-22721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayush Luniya updated AMBARI-22721:
-----------------------------------
    Fix Version/s:     (was: 2.7.1)
                   3.0.0

> Centralize the Management of Tarball Uploading
> ----------------------------------------------
>
>                 Key: AMBARI-22721
>                 URL: https://issues.apache.org/jira/browse/AMBARI-22721
>             Project: Ambari
>          Issue Type: Task
>    Affects Versions: 2.6.2
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 3.0.0
>
>
> Ambari is required to upload tarballs into HDFS for many of the services to 
> correctly function after they are installed. This tarball management is not 
> centralized in any way, and is instead, spread out between several different 
> Python files for various services:
> Hive uploads Tez, MapReduce2, Sqoop, etc tarballs
> Yarn does Tez, Slider, MapReduce2
> This causes a problem when patching a specific service, such as Sqoop. Sqoop 
> requires that sqoop.tar.gz and mapreduce.tar.gz are available in the same 
> versioned folder in HDFS. However, no Sqoop components perform this upload - 
> Hive does. So, if Hive is not upgrading, these tarballs are never uploaded.
> The proposal here is to remove the coupling of tarball uploads and to manage 
> these relationships on the stack:
> {code}
> {
>   "tarball": {
>     "MAPREDUCE2": {
>       "JOB_HISTORY_SERVER": [
>         {
>           "tarball": "mapreduce.tar.gz",
>           "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
>           "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
>         }
>       ]
>     },
>     "HIVE": {
>       "HIVE_SERVER2": [
>         {
>           "tarball": "mapreduce.tar.gz",
>           "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
>           "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
>         },
>         {
>           "tarball": "sqoop.tar.gz",
>           "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
>           "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
>         }
>       ]
>     },
>     "SQOOP": {
>       "SQOOP": [
>         {
>           "tarball": "mapreduce.tar.gz",
>           "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
>           "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
>         },
>         {
>           "tarball": "sqoop.tar.gz",
>           "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
>           "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
>         }
>       ]
>     }
>   }
> }
> {code}
> - after-INSTALL hooks will check for {{CLIENT}} as the component category
> - after-START hooks will check for NOT {{CLIENT}}
> Additionally, using the file length for a checksum may no longer be 
> sufficient. We should also add a checksum file to HDFS for each tarball so we 
> can easily tell if work needs to be done (during an install, restart, 
> upgrade, etc) to upload a new tarball (one that is also potentially modified 
> with native libraries):
> {code:title=ambari-tarball-checksum.json (0644)}
> {
>   "mapreduce.tar.gz": {
>     "native_libraries": true,
>     "file_count": 509
>   },
>   "hadoop-streaming.tar.gz": {
>     "native_libraries": false,
>     "file_count": 10  
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to