James Peach created TS-2302:
-------------------------------

             Summary: log collation causes error logging to stop
                 Key: TS-2302
                 URL: https://issues.apache.org/jira/browse/TS-2302
             Project: Traffic Server
          Issue Type: Bug
          Components: Logging, Quality
            Reporter: James Peach


When we started using the log collation ATS stopped writing to the errors.log 
file. Traffic Server thinks that the configured log space is exhausted:

{code}
[Sep 18 23:00:40.817] Server {0x2b1bdcc51700} WARNING: Access logging to local 
log directory suspended - configured space allocation exhausted.
[Sep 18 23:01:45.008] Server {0x2b05c4e93700} WARNING: Access logging to local 
log directory suspended - configured space allocation exhausted.
[Sep 19 20:57:05.766] Server {0x2b05c4c91700} NOTE: logging space exhausted, 
failed to write orphan file, drop(1832) bytes
[Sep 19 20:57:15.743] Server {0x2b05c4c91700} NOTE: logging space exhausted, 
failed to write orphan file, drop(4360) bytes
{code}

{{LogConfig::update_space_used()}} emits the exhaustion message when 
{{LogConfig::space_to_write(headroom)}} fails:

{code}
$ sudo /opt/bazinga/bin/traffic_line -r proxy.config.log.max_space_mb_headroom
1000
$ sudo gdb -p 28375
...
(gdb) p (bool)(Log::config->space_to_write((int64_t)(1000*1024)))
$1 = false

(gdb) p *(Log::config)
$1 = {initialized = true, reconfiguration_needed = false, 
logging_space_exhausted = true, m_space_used = 28313954, m_partition_space_left 
= 46870138880, 
  roll_log_files_now = false, log_object_manager = {_objects = 0x2b1b68758d00, 
_numObjects = 1, _maxObjects = 8, _APIobjects = 0x2b1b68758d50, _numAPIobjects 
= 0, 
    _maxAPIobjects = 8, _APImutex = 0x2b1b6874ee40}, global_filter_list = 
{m_filter_list = {<DLL<LogFilter, LogFilter::Link_link>> = {head = 0x0}, tail = 
0x0}, 
    m_does_conjunction = true}, global_format_list = {m_format_list = 
{<DLL<LogFormat, LogFormat::Link_link>> = {head = 0x2b1b68764810}, tail = 
0x2b1b68767e10}}, 
  log_buffer_size = 9216, max_secs_per_buffer = 5, max_space_mb_for_logs = 
102400, max_space_mb_for_orphan_logs = 25, max_space_mb_headroom = 1000, 
  logfile_perm = 420, squid_log_enabled = false, xuid_logging_enabled = true, 
squid_log_is_ascii = false, common_log_enabled = false, common_log_is_ascii = 
true, 
  extended_log_enabled = false, extended_log_is_ascii = false, 
extended2_log_enabled = false, extended2_log_is_ascii = true, separate_icp_logs 
= false, 
  separate_host_logs = false, collation_mode = 2, collation_port = 8085, 
collation_host_tagged = false, collation_preproc_threads = 1, 
collation_retry_sec = 5, 
  collation_max_send_buffers = 16, rolling_enabled = 2, rolling_interval_sec = 
86400, rolling_offset_hr = 0, rolling_size_mb = 10240, 
  auto_delete_rolled_files = true, custom_logs_enabled = true, 
search_log_enabled = false, search_rolling_interval_sec = 86400, 
search_server_ip_addr = 0, 
  search_server_port = 8080, search_top_sites = 100, search_log_filters = 0x0, 
search_url_filter = 0x0, search_log_file_one = 0x2b1b68758e00 "search_log1", 
  search_log_file_two = 0x2b1b68758e20 "search_log2", sampling_frequency = 1, 
file_stat_frequency = 32, space_used_frequency = 2, ascii_buffer_size = 36864, 
  max_line_size = 9216, hostname = 0x2b1b68758f20 "localhost", logfile_dir = 
0x2b1b6823fcd0 "/var/log/trafficserver", squid_log_name = 0x2b1b68240960 
"squid", 
  squid_log_header = 0x0, common_log_name = 0x2b1b68225a90 "common", 
common_log_header = 0x0, extended_log_name = 0x2b1b682419f0 "extended", 
  extended_log_header = 0x0, extended2_log_name = 0x2b1b6874f7b0 "extended2", 
extended2_log_header = 0x0, collation_host = 0x2b1b6874fe50 "none", 
  collation_secret = 0x2b1b6874fa20 "Aapl_Baz1ng4!", xml_config_file = 
0x2b1b68758da0 "logs_xml.config", hosts_config_file = 0x2b1b68758dc0 
"log_hosts.config", 
  use_orphan_log_space_value = true, m_log_collation_accept = 0x0, m_dir_entry 
= 0x2b1b68769e30, m_pDir = 0x0, m_disk_full = true, m_disk_low = false, 
  m_partition_full = false, m_partition_low = false, 
m_log_directory_inaccessible = false}
{code}

Note that use_orphan_log_space_value = true.

In proxy/logging/LogConfig.h:
{code}
   169  
   170    int get_max_space_mb() {
   171      return (use_orphan_log_space_value ? max_space_mb_for_orphan_logs : 
max_space_mb_for_logs);
   172    }
   173  
{code}

So the orphan log space is being used to determine the maximum log space 
allowed. This is determined in LogConfig::init():
{code}
   742    // determine if we should use the orphan log space value or not
   743    // we use it if all objects are collation clients, or if some are and
   744    // the specified space for collation is larger than that for local 
files
   745    //
   746    size_t num_collation_clients = 
log_object_manager.get_num_collation_clients();
   747    use_orphan_log_space_value =
   748      (num_collation_clients == 0 ? false :
   749       (log_object_manager.get_num_objects() == num_collation_clients ? 
true :
   750        max_space_mb_for_orphan_logs > max_space_mb_for_logs));
   751  
{code}

Our configuration has 1 log object that is a collation client, so 
{{log_object_manager.get_num_objects() == 
log_object_manager.get_num_collation_clients()}}, and we end up using the 
orphan log space.

The fix is to treat the error.log like a standard log object and manage it like 
all the others. This makes the assumptions around logging thresholds turn out 
to be true.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to