Updated Branches:
  refs/heads/master b187978b8 -> 6dc186740

TS-2364: Introduce slice notation to field syntax in log format

Now, we have two kind of field syntax in log format:
(1) simple syntax: '%<field>', for example: %<cqu>
(2) container field, syntax: '%<{field}container>',
    for example: %<{Referer}cqh>.

This patch introduces slice notation to field syntax, so that we can easy to
limit the length of filed's output. With slice notation, the filed syntax will
looks like:
(1) '%<field[start:end]>'
(2) '%<{field}container[start:end]>'

In both forms, slice notation can be omitted, which means the whole field.

But with one limitation: slice notation makes sense only when the field is
string type and it shouldn't be ip/timestamp which are converted to
string from integer.

We can see the syntax of slice notation from Python, Golang, it's pretty
simple:
  [start:end] //items start through end-1
  [start:]    //items start through the rest of the array
  [:end]      //items from the beginning through end-1
  [:]         //the whole array(by default)

For example,
  '%<cqup>'       //the whole characters of <cqup>.
  '%<cqup>[:]'    //the whole characters of <cqup>.
  '%<cqup[0:30]>' //the first 30 characters of <cqup>.
  '%<cqup[-10:]>' //the last 10 characters of <cqup>.
  '%<cqup[:-5]>'  //everything except the last 5 characters of <cqup>.

Signed-off-by: Yunkai Zhang <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/trafficserver/repo
Commit: http://git-wip-us.apache.org/repos/asf/trafficserver/commit/6dc18674
Tree: http://git-wip-us.apache.org/repos/asf/trafficserver/tree/6dc18674
Diff: http://git-wip-us.apache.org/repos/asf/trafficserver/diff/6dc18674

Branch: refs/heads/master
Commit: 6dc186740eea69c180fdda732608d8b0fedfcbf3
Parents: b187978
Author: Yunkai Zhang <[email protected]>
Authored: Tue Nov 19 18:12:20 2013 +0800
Committer: Yunkai Zhang <[email protected]>
Committed: Wed Nov 20 11:29:39 2013 +0800

----------------------------------------------------------------------
 CHANGES                              |  2 +
 proxy/config/logs_xml.config.default | 24 +++++++---
 proxy/logging/Log.cc                 | 30 ++++++------
 proxy/logging/LogAccess.cc           | 21 ++++++--
 proxy/logging/LogAccess.h            |  4 +-
 proxy/logging/LogField.cc            | 79 ++++++++++++++++++++++++++++++-
 proxy/logging/LogField.h             | 31 ++++++++++++
 proxy/logging/LogFormat.cc           | 15 +++++-
 8 files changed, 178 insertions(+), 28 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/CHANGES
----------------------------------------------------------------------
diff --git a/CHANGES b/CHANGES
index ece5bc6..0e634d6 100644
--- a/CHANGES
+++ b/CHANGES
@@ -1,6 +1,8 @@
                                                          -*- coding: utf-8 -*-
 Changes with Apache Traffic Server 4.2.0
 
+  *) [TS-2364] Introduce slice notation to field syntax in log format.
+
   *) [TS-2360] Fix usage of TSMimeHdrFieldValueStringGet() IDX in some plugins.
 
   *) [TS-2361] Load regex_remap configuration relative to the configuration 
directory.

http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/proxy/config/logs_xml.config.default
----------------------------------------------------------------------
diff --git a/proxy/config/logs_xml.config.default 
b/proxy/config/logs_xml.config.default
index 7633808..00bae06 100644
--- a/proxy/config/logs_xml.config.default
+++ b/proxy/config/logs_xml.config.default
@@ -23,12 +23,24 @@ specifications (a '*' denotes a tag that is required):
        A valid format specification is a printf-style string that describes
        what each log entry looks like when formatted for ascii output.
        Placeholders for valid Inktomi field names are specified using the
-       notation '%<field>'.  The specified field can be of two types: 
-           (1) simple; example %<cqu> 
-          (2) container field, which is a field within a container (such
-          as an http header or an Inktomi stat).  Fields of this type have
-          the syntax: '%<{field}container>'.  See documentation for valid
-          container names.
+       notation '%<field>'.  The specified field can be of two types:
+           (1) simple field: '%<field>', for example %<cqu>.
+           (2) container field: '%<field>container', which is a field within
+               a container (such as an http header or an Inktomi stat).  See
+               documentation for valid container names.
+       To limit the length of field's output, we can use slice notation with
+       above two types:
+           (1) simple field with slice: '%<field[start:end]>',
+           (2) container field with slice: '%<field>container[start:end]'.
+           Here is same example about slice notation:
+            '%<cqup>[:]'    # the whole characters of <cqup>.
+            '%<cqup[0:30]>' # the first 30 characters of <cqup>.
+            '%<cqup[-10:]>' # the last 10 characters of <cqup>.
+            '%<cqup[:-5]>'  # everything except the last 5 characters of 
<cqup>.
+          Note: slice notation makes sense only when:
+            * the target field is a string type, and
+            * it shouldn't be ip/timestamp which are converted
+              to string from integer internally.
        If you want to include quotes within the format string, escape them
        with a backslash. For example, to quote the client request url (cqu),
        you would type something like

http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/proxy/logging/Log.cc
----------------------------------------------------------------------
diff --git a/proxy/logging/Log.cc b/proxy/logging/Log.cc
index f1962ad..546f83f 100644
--- a/proxy/logging/Log.cc
+++ b/proxy/logging/Log.cc
@@ -356,7 +356,7 @@ Log::init_fields()
   field = NEW (new LogField ("client_auth_user_name", "caun",
                              LogField::STRING,
                              &LogAccess::marshal_client_auth_user_name,
-                             &LogAccess::unmarshal_str));
+                             
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add (field, false);
   ink_hash_table_insert (field_symbol_hash, "caun", field);
 
@@ -405,63 +405,63 @@ Log::init_fields()
   field = NEW(new LogField("client_req_text", "cqtx",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_text,
-                           &LogAccess::unmarshal_http_text));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_http_text));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cqtx", field);
 
   field = NEW(new LogField("client_req_http_method", "cqhm",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_http_method,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cqhm", field);
 
   field = NEW(new LogField("client_req_url", "cqu",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_url,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cqu", field);
 
   field = NEW(new LogField("client_req_url_canonical", "cquc",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_url_canon,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cquc", field);
 
   field = NEW(new LogField("client_req_unmapped_url_canonical", "cquuc",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_unmapped_url_canon,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cquuc", field);
 
   field = NEW(new LogField("client_req_unmapped_url_path", "cquup",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_unmapped_url_path,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cquup", field);
 
   field = NEW(new LogField("client_req_unmapped_url_host", "cquuh",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_unmapped_url_host,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cquuh", field);
 
   field = NEW(new LogField("client_req_url_scheme", "cqus",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_url_scheme,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cqus", field);
 
   field = NEW(new LogField("client_req_url_path", "cqup",
                            LogField::STRING,
                            &LogAccess::marshal_client_req_url_path,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "cqup", field);
 
@@ -504,7 +504,7 @@ Log::init_fields()
   field = NEW(new LogField("proxy_resp_content_type", "psct",
                            LogField::STRING,
                            &LogAccess::marshal_proxy_resp_content_type,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "psct", field);
 
@@ -629,7 +629,7 @@ Log::init_fields()
   field = NEW(new LogField("proxy_req_server_name", "pqsn",
                            LogField::STRING,
                            &LogAccess::marshal_proxy_req_server_name,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "pqsn", field);
 
@@ -690,7 +690,7 @@ Log::init_fields()
   field = NEW(new LogField("proxy_host_name", "phn",
                            LogField::STRING,
                            &LogAccess::marshal_proxy_host_name,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "phn", field);
 
@@ -705,7 +705,7 @@ Log::init_fields()
   field = NEW(new LogField("accelerator_id", "xid",
                            LogField::STRING,
                            &LogAccess::marshal_client_accelerator_id,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "xid", field);
   // X-WAID
@@ -723,7 +723,7 @@ Log::init_fields()
   field = NEW(new LogField("server_host_name", "shn",
                            LogField::STRING,
                            &LogAccess::marshal_server_host_name,
-                           &LogAccess::unmarshal_str));
+                           
(LogField::UnmarshalFunc)&LogAccess::unmarshal_str));
   global_field_list.add(field, false);
   ink_hash_table_insert(field_symbol_hash, "shn", field);
 

http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/proxy/logging/LogAccess.cc
----------------------------------------------------------------------
diff --git a/proxy/logging/LogAccess.cc b/proxy/logging/LogAccess.cc
index c6f6eda..4254569 100644
--- a/proxy/logging/LogAccess.cc
+++ b/proxy/logging/LogAccess.cc
@@ -998,7 +998,7 @@ LogAccess::unmarshal_int_to_str_hex(char **buf, char *dest, 
int len)
   -------------------------------------------------------------------------*/
 
 int
-LogAccess::unmarshal_str(char **buf, char *dest, int len)
+LogAccess::unmarshal_str(char **buf, char *dest, int len, LogSlice *slice)
 {
   ink_assert(buf != NULL);
   ink_assert(*buf != NULL);
@@ -1008,6 +1008,21 @@ LogAccess::unmarshal_str(char **buf, char *dest, int len)
   int val_len = (int)::strlen(val_buf);
 
   *buf += LogAccess::strlen(val_buf);   // this is how it was stored
+
+  if (slice && slice->m_enable) {
+    int offset, n;
+
+    n = slice->toStrOffset(val_len, &offset);
+    if (n <= 0)
+      return 0;
+
+    if (n >= len)
+      return -1;
+
+    memcpy(dest, (val_buf + offset), n);
+    return n;
+  }
+
   if (val_len < len) {
     memcpy(dest, val_buf, val_len);
     return val_len;
@@ -1093,7 +1108,7 @@ LogAccess::unmarshal_http_version(char **buf, char *dest, 
int len)
   -------------------------------------------------------------------------*/
 
 int
-LogAccess::unmarshal_http_text(char **buf, char *dest, int len)
+LogAccess::unmarshal_http_text(char **buf, char *dest, int len, LogSlice 
*slice)
 {
   ink_assert(buf != NULL);
   ink_assert(*buf != NULL);
@@ -1108,7 +1123,7 @@ LogAccess::unmarshal_http_text(char **buf, char *dest, 
int len)
   }
   p += res1;
   *p++ = ' ';
-  int res2 = unmarshal_str(buf, p, len - res1 - 1);
+  int res2 = unmarshal_str(buf, p, len - res1 - 1, slice);
   if (res2 < 0) {
     return -1;
   }

http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/proxy/logging/LogAccess.h
----------------------------------------------------------------------
diff --git a/proxy/logging/LogAccess.h b/proxy/logging/LogAccess.h
index e0a27b3..eb10824 100644
--- a/proxy/logging/LogAccess.h
+++ b/proxy/logging/LogAccess.h
@@ -277,10 +277,10 @@ public:
   static int unmarshal_itox(int64_t val, char *dest, int field_width = 0, char 
leading_char = ' ');
   static int unmarshal_int_to_str(char **buf, char *dest, int len);
   static int unmarshal_int_to_str_hex(char **buf, char *dest, int len);
-  static int unmarshal_str(char **buf, char *dest, int len);
+  static int unmarshal_str(char **buf, char *dest, int len, LogSlice *slice = 
NULL);
   static int unmarshal_ttmsf(char **buf, char *dest, int len);
   static int unmarshal_http_version(char **buf, char *dest, int len);
-  static int unmarshal_http_text(char **buf, char *dest, int len);
+  static int unmarshal_http_text(char **buf, char *dest, int len, LogSlice 
*slice = NULL);
   static int unmarshal_http_status(char **buf, char *dest, int len);
   static int unmarshal_ip(char** buf, IpEndpoint* dest);
   static int unmarshal_ip_to_str(char **buf, char *dest, int len);

http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/proxy/logging/LogField.cc
----------------------------------------------------------------------
diff --git a/proxy/logging/LogField.cc b/proxy/logging/LogField.cc
index 3f0ddee..6bab383 100644
--- a/proxy/logging/LogField.cc
+++ b/proxy/logging/LogField.cc
@@ -65,6 +65,78 @@ const char *aggregate_names[] = {
   ""
 };
 
+LogSlice::LogSlice(char *str)
+{
+  char *a, *b, *c;
+
+  m_enable = false;
+  m_start = 0;
+  m_end = INT_MAX;
+
+  if ((a = strchr(str, '[')) == NULL)
+    return;
+
+  *a++ = '\0';
+  if ((b = strchr(a, ':')) == NULL)
+    return;
+
+  *b++ = '\0';
+  if ((c = strchr(b, ']')) == NULL)
+    return;
+
+  m_enable = true;
+
+  // eat space
+  while (a != b && *a == ' ') a++;
+
+  if (a != b)
+    m_start = atoi(a);
+
+  // eat space
+  while (b != c && *b == ' ') b++;
+
+  if (b != c)
+    m_end = atoi(b);
+}
+
+int
+LogSlice::toStrOffset(int strlen, int *offset)
+{
+  int i, j, len;
+
+  // letf index
+  if (m_start >= 0)
+    i = m_start;
+  else
+    i = m_start + strlen;
+
+  if (i >= strlen)
+    return 0;
+
+  if (i < 0)
+    i = 0;
+
+  // right index
+  if (m_end >= 0)
+    j = m_end;
+  else
+    j = m_end + strlen;
+
+  if (j <= 0)
+    return 0;
+
+  if (j > strlen)
+    j = strlen;
+
+  // available length
+  len = j - i;
+
+  if (len > 0)
+    *offset = i;
+
+  return len;
+}
+
 /*-------------------------------------------------------------------------
   LogField::LogField
   -------------------------------------------------------------------------*/
@@ -131,7 +203,7 @@ LogField::LogField(const char *field, Container container)
   case ESSH:
   case ECSSH:
   case SCFG:
-    m_unmarshal_func = &(LogAccess::unmarshal_str);
+    m_unmarshal_func = (UnmarshalFunc)&(LogAccess::unmarshal_str);
     break;
 
   case ICFG:
@@ -300,6 +372,11 @@ unsigned
 LogField::unmarshal(char **buf, char *dest, int len)
 {
   if (m_alias_map == NULL) {
+    if (m_unmarshal_func == (UnmarshalFunc)LogAccess::unmarshal_str
+        || m_unmarshal_func == (UnmarshalFunc)LogAccess::unmarshal_http_text) {
+      UnmarshalFuncWithSlice func = (UnmarshalFuncWithSlice)m_unmarshal_func;
+      return (*func) (buf, dest, len, &m_slice);
+    }
     return (*m_unmarshal_func) (buf, dest, len);
   } else {
     return (*m_unmarshal_func_map) (buf, dest, len, m_alias_map);

http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/proxy/logging/LogField.h
----------------------------------------------------------------------
diff --git a/proxy/logging/LogField.h b/proxy/logging/LogField.h
index 6aa35d1..1847daa 100644
--- a/proxy/logging/LogField.h
+++ b/proxy/logging/LogField.h
@@ -31,6 +31,35 @@
 
 class LogAccess;
 
+struct LogSlice
+{
+  bool m_enable;
+  int m_start;
+  int m_end;
+
+  LogSlice() {
+    m_enable = false;
+    m_start = 0;
+    m_end = INT_MAX;
+  }
+
+  //
+  // Initialize LogSlice by slice notation,
+  // the str looks like: "xxx[0:30]".
+  //
+  LogSlice(char *str);
+
+  //
+  // Convert slice notation to target string's offset,
+  // return the available length belongs to this slice.
+  //
+  // Use the offset and return value, we can locate the
+  // string content indicated by this slice.
+  //
+  int toStrOffset(int strlen, int *offset);
+};
+
+
 /*-------------------------------------------------------------------------
   LogField
 
@@ -47,6 +76,7 @@ class LogField
 public:
   typedef int (LogAccess::*MarshalFunc) (char *buf);
   typedef int (*UnmarshalFunc) (char **buf, char *dest, int len);
+  typedef int (*UnmarshalFuncWithSlice) (char **buf, char *dest, int len, 
LogSlice *slice);
   typedef int (*UnmarshalFuncWithMap) (char **buf, char *dest, int len, 
Ptr<LogFieldAliasMap> map);
 
 
@@ -152,6 +182,7 @@ private:
 
 public:
   LINK(LogField, link);
+  LogSlice m_slice;
 
 private:
 // luis, check where this is used and what it does

http://git-wip-us.apache.org/repos/asf/trafficserver/blob/6dc18674/proxy/logging/LogFormat.cc
----------------------------------------------------------------------
diff --git a/proxy/logging/LogFormat.cc b/proxy/logging/LogFormat.cc
index 753779f..80e46f4 100644
--- a/proxy/logging/LogFormat.cc
+++ b/proxy/logging/LogFormat.cc
@@ -502,6 +502,7 @@ LogFormat::parse_symbol_string(const char *symbol_string, 
LogFieldList *field_li
         name = symbol + 1;
         *name_end = 0;          // changes '}' to '\0'
         sym = name_end + 1;     // start of container symbol
+        LogSlice slice(sym);
         Debug("log-format", "Name = %s, symbol = %s", name, sym);
         container = LogField::valid_container_name(sym);
         if (container == LogField::NO_CONTAINER) {
@@ -509,6 +510,11 @@ LogFormat::parse_symbol_string(const char *symbol_string, 
LogFieldList *field_li
         } else {
           f = NEW(new LogField(name, container));
           ink_assert(f != NULL);
+          if (slice.m_enable) {
+            f->m_slice = slice;
+            Debug("log-slice", "symbol = %s, [%d:%d]", sym,
+                  f->m_slice.m_start, f->m_slice.m_end);
+          }
           field_list->add(f, false);
           field_count++;
           Debug("log-format", "Container field {%s}%s added", name, sym);
@@ -521,10 +527,17 @@ LogFormat::parse_symbol_string(const char *symbol_string, 
LogFieldList *field_li
     // treat this like a regular field symbol
     //
     else {
+      LogSlice slice(symbol);
       Debug("log-format", "Regular field symbol: %s", symbol);
       f = Log::global_field_list.find_by_symbol(symbol);
       if (f != NULL) {
-        field_list->add(f);
+        LogField *cpy = NEW(new LogField(*f));
+        if (slice.m_enable) {
+          cpy->m_slice = slice;
+          Debug("log-slice", "symbol = %s, [%d:%d]", symbol,
+                cpy->m_slice.m_start, cpy->m_slice.m_end);
+        }
+        field_list->add(cpy, false);
         field_count++;
         Debug("log-format", "Regular field %s added", symbol);
       } else {

Reply via email to