I setup a mysql database to catch Hadoop log events (at the warn level and above). I setup a database named HadoopLog with a table like:

mysql> show columns from Events;
+----------+--------------+------+-----+---------+-------+
| Field    | Type         | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| category | varchar(80)  | YES  |     | NULL    |       |
| t        | datetime     | YES  |     | NULL    |       |
| priority | varchar(12)  | YES  |     | NULL    |       |
| host     | varchar(100) | YES  |     | NULL    |       |
| message  | varchar(255) | YES  |     | NULL    |       |
+----------+--------------+------+-----+---------+-------+

set the number of max_connections=4000 on the mysql server

You'll also need to download the following GPL'ed software.

jdbcappender.jar from http://www.dankomannhaupt.de/projects/index.html
mysql-connector-java-3.0.17-ga-bin.jar from http://www.mysql.com/ products/connector/j/

and put them into your hadoop/lib directory. Note that the jdbcappender is not the one built into Log4j, which has some really bad limitations like not quoting single quotes.

If you change the root logger to include MySql and define MySql to be a new appender in log4j.properties like (after changing the hostname, database name, login, and password):

# Have all loggers in both servers and clients send to MySql as well as the default.
log4j.rootLogger=${hadoop.root.logger},MySql

# Define a MySql appender that sends events to a mysql database
log4j.appender.MySql=org.apache.log4j.jdbcplus.JDBCAppender
log4j.appender.MySql.dbclass=com.mysql.jdbc.Driver
log4j.appender.MySql.threshold=WARN
log4j.appender.MySql.url=jdbc:mysql://mysqlhost.foobar.com/HadoopLog
log4j.appender.MySql.username=hadoop
log4j.appender.MySql.password=whatever
log4j.appender.MySql.sql=insert into Events (category, t, priority, host, message) values ('@CAT@', current_timestamp(), '@PRIO@', substring_index(user(),'@',-1), '@MSG@')
log4j.appender.MySql.layout=org.apache.log4j.PatternLayout

The advantage of having the events in a single database, is that you can do queries to summarize the events. For example, using the sort example on 4tb of data on 409 nodes, I ended up with the following distribution of events:

mysql> select priority, category, count(*) from Events group by priority,category;
+----------+------------------------------------------+----------+
| priority | category                                 | count(*) |
+----------+------------------------------------------+----------+
| WARN     | org.apache.hadoop.dfs.DataNode           |        4 |
| WARN     | org.apache.hadoop.ipc.Server             |       35 |
| WARN     | org.apache.hadoop.mapred.JobTracker      |       30 |
| WARN     | org.apache.hadoop.mapred.TaskRunner      |     8256 |
| WARN     | org.apache.hadoop.mapred.TaskTracker     |        3 |
| WARN     | org.mortbay.http.HttpContext             |        1 |
| WARN     | org.mortbay.http.SocketListener          |   971317 |
| WARN     | org.mortbay.jetty.servlet.ServletHandler |      409 |
| WARN     | org.mortbay.util.ThreadPool              |     2659 |
+----------+------------------------------------------+----------+
9 rows in set (3.83 sec)

If you look at the details, the ThreadPool and SocketListener messages are about jetty running out of threads for serving the map outputs. The 35 ipc.Server messages are rpc calls being dropped and so on. It lets me view the errors across the entire cluster. If I was going to put in production, I'd probably leave the filter at WARN for org.apache and ERROR for org.mortbay. It would drastically cut down the number of events and still catch the most interesting ones.

The advantage of the sql database is primarily in doing the adhoc queries. For example, I wanted to see how often each host was put in the penalty box (for failing to transfer map outputs):

mysql> select src,count(*) as cnt from (select substring_index (substring_index(message," ",4)," ",-1) as src from Events where category like '%.TaskRunner' and message like '%penalty box%') as subslt group by src order by cnt;
+----------+------+
| src      | cnt  |
+----------+------+
| node2227 |    1 |
| node2102 |    1 |
| node2084 |    1 |
| node2194 |    1 |
| node2124 |    1 |
| node2185 |    1 |
| node2400 |    1 |
| node2103 |    1 |
| node2318 |    1 |
| node2385 |    4 |
| node2296 |    5 |
| node2222 |    8 |
| node2408 |   31 |
| node2028 |  677 |
| node2003 | 1294 |
+----------+------+
15 rows in set (1.30 sec)

Which shows that things went badly on a couple of nodes, but in general it was good. Looking at the distribution the other way didn't prove very useful because all 409 nodes put a node in the penalty box and the distribution went from 1 to 11 events per a node.

I also looked at the rpc timeouts to see which messages and hosts where having problems:

mysql> select substring_index(host,".",1) as host,substring_index (message,"(", 1) as msg,count(*) as cnt from Events where category='org.apache.hadoop.ipc.Server' group by host,msg order by cnt;
+----------+-------------------------+-----+
| host     | msg                     | cnt |
+----------+-------------------------+-----+
| node2194 | Call ping               |   1 |
| node2214 | handler output error    |   1 |
| node2319 | handler output error    |   1 |
| node2013 | handler output error    |   1 |
| node2386 | Call ping               |   1 |
| node2314 | handler output error    |   1 |
| node2091 | handler output error    |   1 |
| node2127 | handler output error    |   1 |
| node2311 | handler output error    |   1 |
| node2226 | Call ping               |   1 |
| node2034 | Call ping               |   1 |
| node2269 | handler output error    |   1 |
| node2003 | Call progress           |   2 |
| node2003 | Call getProtocolVersion |   2 |
| node2108 | Call ping               |   2 |
| node2028 | Call ping               |   5 |
| node2003 | Call ping               |  12 |
+----------+-------------------------+-----+
17 rows in set (1.16 sec)

Using mysql to log the important events gave me a convenient way to look at the logs across the entire cluster and let me get a clearer view of what the system was having problems with.

I guess after writing all of this down, I should put in on Hadoop's wiki. *smile*

-- Owen

Reply via email to