[
https://issues.apache.org/jira/browse/AMBARI-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Lysnichenko updated AMBARI-4481:
---------------------------------------
Description:
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP.
These files are accessible via url like
http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among these files there
are service scripts, templates and hooks. Agent has a cache of these files.
Cache directory structure is similar to contents of a stacks folder at server.
For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
└── HDP
├── 2.0.7
│ ├── Accumulo
│ └── Flume
└── 2.0.8
├── Accumulo
├── Flume
└── YetAnotherService
{code}
If files for some service, component and stack version is not available at
cache, agent downloads appropriate files on first use. Agent unpacks downloaded
archive at memory and writes files on disk (without storing initial zip archive
on disk). After files are successfully unpacked, hash is also downloaded to a
separate file (this way, we ensure cache consistency). If any step of cache
update fails (due to timeout, missing files, broken archieve etc), agent fails
command execution with an appropriate message.
h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not
allow to list directories. To speed up download and avoid the need to list
script files explicitly, the proposal is to pack directories "hooks" and
"packages" into zip archives. After download, agent unpacks archive into cache.
Execution steps:
- on server startup, python script iterates over "hooks"/"package" directories
and counts directory sha1 hashes. Files and directories are listed in
alphabetical order, hash sum files and existing directory archives are skipped.
Only active (enabled) stacks are hashed/archived.
- if directory archive does not exist or sha1 hash sum differs from previously
counted hash sum, archive is regenerated and saved to "archive.zip" file.
- sha1 hash of the directory is saved to .hash file in the root of
"hooks"/"package" directory.
This way, we ensure that an archive is still actual if user changes some file
in directory or replaces entire directory.
h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new
files/stacks/etc, user has to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically
h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. We
use them for cache invalidation. As stack files may only change on server
restart (and agent reregistration), we will verify hashes only once and store
the result in FileCache until next agent registration.
h2. Custom actions
Custom action scripts are fetched/updated the same way as other files and are
stored at /var/lib/ambari-agent/cache/custom_actions.
h2. Choosing error handling strategy for download/unpack errors and other
settings
Agent has two caching-related settings at ambari-agent.ini file.
{code}
[agent]
cache_dir=/var/lib/ambari-agent/cache
tolerate_download_failures=true
{code}
tolerate_download_failures option (defaults to true) determines agent behaviour
in case of any cache update error (while checking hashes, during file download
or archive unpacking). If a value is true, agent just writes down a warning and
continues command execution with existing cache. If value is false, agent
immediately considers ExecutionCommand failed (so user may see the failed
command at UI with appropriate error message).
h2. rpm packaging
Currently, stack files are included both to ambari-agent and to ambari-server
rpms. So agent comes with pre-packaged file cache. The issue is that files that
are packaged into an agent cache are not hashed (no ".hash" files exist),
that's why after rpm installation agent considers it's cache stale and tries to
update cache from the server. I'll add on-fly stack files hashing during rpm
generation at a separate jira.
h2. other ambari-server changes
I've created a valid python ambari-server package, that is properly packaged
into rpm and is visible to ambari-server.py.
was:
h1. Proposal:
h2. General conception
Ambari server shares some files at /var/lib/ambari-server/resources/ via HTTP.
These files are accessible via url like
http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among these files there
are service scripts, templates and hooks. Agent has a cache of these files.
Cache directory structure is similar to contents of a stacks folder at server.
For example:
$ ls /var/lib/ambari-agent/cache
{code}
└── stacks
└── HDP
├── 2.0.7
│ ├── Accumulo
│ └── Flume
└── 2.0.8
├── Accumulo
├── Flume
└── YetAnotherService
{code}
If files for some service, component and stack version is not available at
cache, agent downloads appropriate files on first use. After files are
successfully unpacked, hash is also downloaded to a separate file (this way, we
ensure cache consistency). If any step of cache update fails (due to timeout,
missing files, broken archieve etc), agent fails command execution with an
appropriate message.
h2. Packaging files into archives:
The trouble is that in current Jetty configuration, ambari-server does not
allow to list directories. To speed up download and avoid the need to list
script files explicitly, the proposal is to pack directories "hooks" and
"packages" into zip archives. After download, agent unpacks archive into cache.
Execution steps:
- on server startup, python script iterates over "hooks"/"package" directories
and counts directory sha1 hashes. Files and directories are listed in
alphabetical order, hash sum files and existing directory archives are skipped.
Only active (enabled) stacks are hashed/archived.
- if directory archive does not exist or sha1 hash sum differs from previously
counted hash sum, archive is regenerated and saved to "archive.zip" file.
- sha1 hash of the directory is saved to .hash file in the root of
"hooks"/"package" directory.
This way, we ensure that an archive is still actual if user changes some file
in directory or replaces entire directory.
h2. How to change stack files after server installation
To change stack files (scripts, templates and so on) or add new
files/stacks/etc, user has to:
- stop ambari-server
- perform changes
- start ambari-server
- everything else will be done automagically
h2. Cache invalidation
Besides package archives, agent also downloads and stores archive hashes. We
use them for cache invalidation. As stack files may only change on server
restart (and agent reregistration), we will verify hashes only once and store
the result in FileCache until next agent registration.
h2. Custom actions
Custom action scripts are fetched/updated the same way as other files and are
stored at /var/lib/ambari-agent/cache/custom_actions.
h2. Choosing error handling strategy for download/unpack errors and other
settings
Agent has two caching-related settings at ambari-agent.ini file.
{code}
[agent]
cache_dir=/var/lib/ambari-agent/cache
tolerate_download_failures=true
{code}
tolerate_download_failures option (defaults to true) determines agent behaviour
in case of any cache update error (while checking hashes, during file download
or archive unpacking). If a value is true, agent just writes down a warning and
continues command execution with existing cache. If value is false, agent
immediately considers ExecutionCommand failed (so user may see the failed
command at UI with appropriate error message).
h2. rpm packaging
Currently, stack files are included both to ambari-agent and to ambari-server
rpms. So agent comes with pre-packaged file cache. The issue is that files that
are packaged into an agent cache are not hashed (no ".hash" files exist),
that's why after rpm installation agent considers it's cache stale and tries to
update cache from the server. I'll add on-fly stack files hashing during rpm
generation at a separate jira.
h2. other ambari-server changes
I've created a valid python ambari-server package, that is properly packaged
into rpm and is visible to ambari-server.py.
> Add to the agent ability to download service scripts and hooks
> --------------------------------------------------------------
>
> Key: AMBARI-4481
> URL: https://issues.apache.org/jira/browse/AMBARI-4481
> Project: Ambari
> Issue Type: Task
> Components: agent, controller
> Affects Versions: 1.5.0
> Reporter: Dmitry Lysnichenko
> Assignee: Dmitry Lysnichenko
> Fix For: 1.5.0
>
> Attachments: AMBARI-4481_preview.patch
>
>
> h1. Proposal:
> h2. General conception
> Ambari server shares some files at /var/lib/ambari-server/resources/ via
> HTTP. These files are accessible via url like
> http://hostname:8080/resources/jdk-6u31-linux-x64.bin . Among these files
> there are service scripts, templates and hooks. Agent has a cache of these
> files. Cache directory structure is similar to contents of a stacks folder at
> server. For example:
> $ ls /var/lib/ambari-agent/cache
> {code}
> └── stacks
> └── HDP
> ├── 2.0.7
> │ ├── Accumulo
> │ └── Flume
> └── 2.0.8
> ├── Accumulo
> ├── Flume
> └── YetAnotherService
> {code}
> If files for some service, component and stack version is not available at
> cache, agent downloads appropriate files on first use. Agent unpacks
> downloaded archive at memory and writes files on disk (without storing
> initial zip archive on disk). After files are successfully unpacked, hash is
> also downloaded to a separate file (this way, we ensure cache consistency).
> If any step of cache update fails (due to timeout, missing files, broken
> archieve etc), agent fails command execution with an appropriate message.
> h2. Packaging files into archives:
> The trouble is that in current Jetty configuration, ambari-server does not
> allow to list directories. To speed up download and avoid the need to list
> script files explicitly, the proposal is to pack directories "hooks" and
> "packages" into zip archives. After download, agent unpacks archive into
> cache.
> Execution steps:
> - on server startup, python script iterates over "hooks"/"package"
> directories and counts directory sha1 hashes. Files and directories are
> listed in alphabetical order, hash sum files and existing directory archives
> are skipped. Only active (enabled) stacks are hashed/archived.
> - if directory archive does not exist or sha1 hash sum differs from
> previously counted hash sum, archive is regenerated and saved to
> "archive.zip" file.
> - sha1 hash of the directory is saved to .hash file in the root of
> "hooks"/"package" directory.
> This way, we ensure that an archive is still actual if user changes some file
> in directory or replaces entire directory.
> h2. How to change stack files after server installation
> To change stack files (scripts, templates and so on) or add new
> files/stacks/etc, user has to:
> - stop ambari-server
> - perform changes
> - start ambari-server
> - everything else will be done automagically
> h2. Cache invalidation
> Besides package archives, agent also downloads and stores archive hashes. We
> use them for cache invalidation. As stack files may only change on server
> restart (and agent reregistration), we will verify hashes only once and store
> the result in FileCache until next agent registration.
> h2. Custom actions
> Custom action scripts are fetched/updated the same way as other files and are
> stored at /var/lib/ambari-agent/cache/custom_actions.
> h2. Choosing error handling strategy for download/unpack errors and other
> settings
> Agent has two caching-related settings at ambari-agent.ini file.
> {code}
> [agent]
> cache_dir=/var/lib/ambari-agent/cache
> tolerate_download_failures=true
> {code}
> tolerate_download_failures option (defaults to true) determines agent
> behaviour in case of any cache update error (while checking hashes, during
> file download or archive unpacking). If a value is true, agent just writes
> down a warning and continues command execution with existing cache. If value
> is false, agent immediately considers ExecutionCommand failed (so user may
> see the failed command at UI with appropriate error message).
> h2. rpm packaging
> Currently, stack files are included both to ambari-agent and to ambari-server
> rpms. So agent comes with pre-packaged file cache. The issue is that files
> that are packaged into an agent cache are not hashed (no ".hash" files
> exist), that's why after rpm installation agent considers it's cache stale
> and tries to update cache from the server. I'll add on-fly stack files
> hashing during rpm generation at a separate jira.
> h2. other ambari-server changes
> I've created a valid python ambari-server package, that is properly packaged
> into rpm and is visible to ambari-server.py.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)