[ 
https://issues.apache.org/jira/browse/AMBARI-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Lysnichenko updated AMBARI-4481:
---------------------------------------

    Description: 
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. 
These files are accessible via url like 
http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among these files there 
are service scripts, templates and hooks. Agent has a cache of these files. 
Cache directory structure is similar to contents of a stacks folder at server. 
For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
    └── HDP
        ├── 2.0.7
        │   ├── Accumulo
        │   └── Flume
        └── 2.0.8
            ├── Accumulo
            ├── Flume
            └── YetAnotherService
{code}
If files for some service, component and stack version is not available at 
cache, agent downloads appropriate files on first use. After files are 
successfully unpacked, hash is also downloaded to a separate file (this way, we 
ensure cache consistency). If any step of cache update fails (due to timeout, 
missing files, broken archieve etc), agent fails command execution with an 
appropriate message.

h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not 
allow to list directories.  We have two options:
- To speed up download and avoid  the need to list script files explicitly, the 
proposal is to pack directories "hooks" and "packages" into zip archives. After 
download, agent unpacks archive into cache.
- We may set "dirAllowed" servlet option for /resources/* and in this case 
agent will download all files one by one. User will not have to run additional 
commands to have stack files updated (improved usability). For every file being 
downloaded, a separate request will be sent. This way to fetch files seems to 
be too slow, especially on big clusters.

As a second way seems to be not applicable because it limits scalability, I'm 
going to implement the first way. Implementation steps:
- on server startup, python script iterates over "hooks"/"package" directories 
and counts directory sha1 hashes. Files and directories are listed in 
alphabetical order, hash sum files and existing directory archives are skipped.
- if directory archive does not exist or sha1 hash sum differs from previously 
counted hash sum, archive is regenerated and saved to "archive.zip" file.
- sha1 hash of the directory is saved to .hash file in the root of 
"hooks"/"package" directory.
This way, we ensure that an archive is still actual if user changes some file 
in directory or replaces entire directory.  

h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new 
files/stacks/etc, user has to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically

h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. We 
use them for cache invalidation. As stack files may only change on server 
restart (and agent reregistration), we will verify hashes only once and store 
the result in FileCache until next agent registration.

h2. Custom actions
Custom action scripts are fetched/updated the same way as other files and are 
stored at  /var/lib/ambari-agent/cache/custom_actions.

h2. Choosing error handling strategy for download/unpack errors and other 
settings
Agent has two related settings at ambari-agent.ini file.
{code}
[agent]
cache_dir=/var/lib/ambari-agent/cache
tolerate_download_failures=true
{code}
tolerate_download_failures option (defaults to true) determines agent actions 
in case of some error occursion  while checking hashes, during file download or 
archive unpacking. If value is true, agent just 

  was:
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP. 
These files are accessible via url like 
http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among these files there 
are service scripts, templates and hooks. Agent has a cache of these files. 
Cache directory structure is similar to contents of a stacks folder at server. 
For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
    └── HDP
        ├── 2.0.7
        │   ├── Accumulo
        │   └── Flume
        └── 2.0.8
            ├── Accumulo
            ├── Flume
            └── YetAnotherService
{code}
If files for some service, component and stack version is not available at 
cache, agent downloads appropriate files on first use. After files are 
successfully unpacked, hash is also downloaded to a separate file (this way, we 
ensure cache consistency). If any step of cache update fails (due to timeout, 
missing files, broken archieve etc), agent fails command execution with an 
appropriate message.

h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not 
allow to list directories.  We have two options:
- To speed up download and avoid  the need to list script files explicitly, the 
proposal is to pack directories "hooks" and "packages" into zip archives. After 
download, agent unpacks archive into cache.
- We may set "dirAllowed" servlet option for /resources/* and in this case 
agent will download all files one by one. User will not have to run additional 
commands to have stack files updated (improved usability). For every file being 
downloaded, a separate request will be sent. This way to fetch files seems to 
be too slow, especially on big clusters.

As a second way seems to be not applicable because it limits scalability, I'm 
going to implement the first way. Implementation steps:
- on server startup, python script iterates over "hooks"/"package" directories 
and counts directory sha1 hashes. Files and directories are listed in 
alphabetical order, hash sum files and existing directory archives are skipped.
- if directory archive does not exist or sha1 hash sum differs from previously 
counted hash sum, archive is regenerated and saved to "archive.zip" file.
- sha1 hash of the directory is saved to .hash file in the root of 
"hooks"/"package" directory.
This way, we ensure that an archive is still actual if user changes some file 
in directory or replaces entire directory.  

h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new 
files/stacks/etc, user has to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically

h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. We 
use them for cache invalidation. As stack files may only change on server 
restart (and agent reregistration), we will verify hashes only once and store 
the result in FileCache until next agent registration.

h2. Custom actions
I'm going to use the same approach for fetching 
/var/lib/ambari-agent/resources/custom_actions. [~sumitmohanty], can you please 
post any entry points of using/testing custom actions via API?



> Add to the agent ability to download service scripts and hooks
> --------------------------------------------------------------
>
>                 Key: AMBARI-4481
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4481
>             Project: Ambari
>          Issue Type: Task
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>         Attachments: AMBARI-4481_preview.patch
>
>
> h1. Proposal:
> h2. General conception
> Ambari server shares some files at /var/lib/ambari-server/resources/ via 
> HTTP. These files are accessible via url like 
> http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among these files 
> there are service scripts, templates and hooks. Agent has a cache of these 
> files. Cache directory structure is similar to contents of a stacks folder at 
> server. For example:
> $ ls /var/lib/ambari-agent/cache
> {code}
> └── stacks
>     └── HDP
>         ├── 2.0.7
>         │   ├── Accumulo
>         │   └── Flume
>         └── 2.0.8
>             ├── Accumulo
>             ├── Flume
>             └── YetAnotherService
> {code}
> If files for some service, component and stack version is not available at 
> cache, agent downloads appropriate files on first use. After files are 
> successfully unpacked, hash is also downloaded to a separate file (this way, 
> we ensure cache consistency). If any step of cache update fails (due to 
> timeout, missing files, broken archieve etc), agent fails command execution 
> with an appropriate message.
> h2. Packaging files into archives:
> The trouble is that in current Jetty configuration, ambari-server does not 
> allow to list directories.  We have two options:
> - To speed up download and avoid  the need to list script files explicitly, 
> the proposal is to pack directories "hooks" and "packages" into zip archives. 
> After download, agent unpacks archive into cache.
> - We may set "dirAllowed" servlet option for /resources/* and in this case 
> agent will download all files one by one. User will not have to run 
> additional commands to have stack files updated (improved usability). For 
> every file being downloaded, a separate request will be sent. This way to 
> fetch files seems to be too slow, especially on big clusters.
> As a second way seems to be not applicable because it limits scalability, I'm 
> going to implement the first way. Implementation steps:
> - on server startup, python script iterates over "hooks"/"package" 
> directories and counts directory sha1 hashes. Files and directories are 
> listed in alphabetical order, hash sum files and existing directory archives 
> are skipped.
> - if directory archive does not exist or sha1 hash sum differs from 
> previously counted hash sum, archive is regenerated and saved to 
> "archive.zip" file.
> - sha1 hash of the directory is saved to .hash file in the root of 
> "hooks"/"package" directory.
> This way, we ensure that an archive is still actual if user changes some file 
> in directory or replaces entire directory.  
> h2. How to change stack files after server installation
> To change stack files (scripts, templates and so on) or add new 
> files/stacks/etc, user has to:
> - stop ambari-server
> - perform changes
> - start ambari-server
> - everything else will be done automagically
> h2. Cache invalidation
> Besides package archives, agent also downloads and stores archive hashes. We 
> use them for cache invalidation. As stack files may only change on server 
> restart (and agent reregistration), we will verify hashes only once and store 
> the result in FileCache until next agent registration.
> h2. Custom actions
> Custom action scripts are fetched/updated the same way as other files and are 
> stored at  /var/lib/ambari-agent/cache/custom_actions.
> h2. Choosing error handling strategy for download/unpack errors and other 
> settings
> Agent has two related settings at ambari-agent.ini file.
> {code}
> [agent]
> cache_dir=/var/lib/ambari-agent/cache
> tolerate_download_failures=true
> {code}
> tolerate_download_failures option (defaults to true) determines agent actions 
> in case of some error occursion  while checking hashes, during file download 
> or archive unpacking. If value is true, agent just 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to