[ 
https://issues.apache.org/jira/browse/AMBARI-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887707#comment-13887707
 ] 

Dmitry Lysnichenko commented on AMBARI-4481:
--------------------------------------------

[~mahadev]/[~swagle], can you please take a look on proposal?

> Add to the agent ability to download service scripts and hooks
> --------------------------------------------------------------
>
>                 Key: AMBARI-4481
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4481
>             Project: Ambari
>          Issue Type: Task
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>
> h1. Proposal:
> h2. General conception
> Ambari server shares some files at /var/lib/ambari-server/resources/ via 
> HTTP. These files are accessible via url like 
> http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among these files 
> there are service scripts, templates and hooks. Agent has a cache of these 
> files. Cache directory structure is similar to contents of a stacks folder at 
> server. For example:
> $ ls /var/lib/ambari-agent/cache
> {code}
> └── stacks
>     └── HDP
>         ├── 2.0.7
>         │   ├── Accumulo
>         │   └── Flume
>         └── 2.0.8
>             ├── Accumulo
>             ├── Flume
>             └── YetAnotherService
> {code}
> If files for some service, component and stack version is not available at 
> cache, agent downloads appropriate files on first use.
> h2. Packaging files into archives:
> The trouble is that in current Jetty configuration, ambari-server does not 
> allow to list directories.  We have two options:
> - To speed up download and avoid need to list script files explicitly, the 
> proposal is to pack directories "hooks" and "packages" into gz archives. 
> - We may set "dirAllowed" servlet option for /resources/* and in this case 
> agent will download all files one by one. User will not have to run 
> additional commands to have stack files updated (improved usability). For 
> every file being downloaded, a separate request will be sent. This way to 
> fetch files seems to be too slow, especially on big clusters.
> As a second way seems to be not applicable because it limits scalability, I'm 
> going to implement the first way. Implementation steps:
> - on server startup, python script iterates over "hooks"/"package" 
> directories and counts directory sha1 hashes. Files and directories are 
> listed in alphabetical order, hash sum files and existing directory archives 
> are skipped.
> - if directory archive does not exist or sha1 hash sum differs from 
> previously counted hash sum, archive is regenerated and saved to "archive.gz" 
> file.
> - sha1 hash of the directory is saved to .hash file in the root of 
> "hooks"/"package" directory.
> This way, we ensure that an archive is still actual if user changes some file 
> in directory or replaces entire directory.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to