I setup a mysql database to catch Hadoop log events (at the warn
level and above). I setup a database named HadoopLog with a table like:
mysql> show columns from Events;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| category | varchar(80) | YES | | NULL | |
| t | datetime | YES | | NULL | |
| priority | varchar(12) | YES | | NULL | |
| host | varchar(100) | YES | | NULL | |
| message | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+
set the number of max_connections=4000 on the mysql server
You'll also need to download the following GPL'ed software.
jdbcappender.jar from http://www.dankomannhaupt.de/projects/index.html
mysql-connector-java-3.0.17-ga-bin.jar from http://www.mysql.com/
products/connector/j/
and put them into your hadoop/lib directory. Note that the
jdbcappender is not the one built into Log4j, which has some really
bad limitations like not quoting single quotes.
If you change the root logger to include MySql and define MySql to be
a new appender in log4j.properties like (after changing the hostname,
database name, login, and password):
# Have all loggers in both servers and clients send to MySql as well
as the default.
log4j.rootLogger=${hadoop.root.logger},MySql
# Define a MySql appender that sends events to a mysql database
log4j.appender.MySql=org.apache.log4j.jdbcplus.JDBCAppender
log4j.appender.MySql.dbclass=com.mysql.jdbc.Driver
log4j.appender.MySql.threshold=WARN
log4j.appender.MySql.url=jdbc:mysql://mysqlhost.foobar.com/HadoopLog
log4j.appender.MySql.username=hadoop
log4j.appender.MySql.password=whatever
log4j.appender.MySql.sql=insert into Events (category, t, priority,
host, message) values ('@CAT@', current_timestamp(), '@PRIO@',
substring_index(user(),'@',-1), '@MSG@')
log4j.appender.MySql.layout=org.apache.log4j.PatternLayout
The advantage of having the events in a single database, is that you
can do queries to summarize the events. For example, using the sort
example on 4tb of data on 409 nodes, I ended up with the following
distribution of events:
mysql> select priority, category, count(*) from Events group by
priority,category;
+----------+------------------------------------------+----------+
| priority | category | count(*) |
+----------+------------------------------------------+----------+
| WARN | org.apache.hadoop.dfs.DataNode | 4 |
| WARN | org.apache.hadoop.ipc.Server | 35 |
| WARN | org.apache.hadoop.mapred.JobTracker | 30 |
| WARN | org.apache.hadoop.mapred.TaskRunner | 8256 |
| WARN | org.apache.hadoop.mapred.TaskTracker | 3 |
| WARN | org.mortbay.http.HttpContext | 1 |
| WARN | org.mortbay.http.SocketListener | 971317 |
| WARN | org.mortbay.jetty.servlet.ServletHandler | 409 |
| WARN | org.mortbay.util.ThreadPool | 2659 |
+----------+------------------------------------------+----------+
9 rows in set (3.83 sec)
If you look at the details, the ThreadPool and SocketListener
messages are about jetty running out of threads for serving the map
outputs. The 35 ipc.Server messages are rpc calls being dropped
and so on. It lets me view the errors across the entire cluster. If
I was going to put in production, I'd probably leave the filter at
WARN for org.apache and ERROR for org.mortbay. It would drastically
cut down the number of events and still catch the most interesting ones.
The advantage of the sql database is primarily in doing the adhoc
queries. For example, I wanted to see how often each host was put in
the penalty box (for failing to transfer map outputs):
mysql> select src,count(*) as cnt from (select substring_index
(substring_index(message," ",4)," ",-1) as src from Events where
category like '%.TaskRunner' and message like '%penalty box%') as
subslt group by src order by cnt;
+----------+------+
| src | cnt |
+----------+------+
| node2227 | 1 |
| node2102 | 1 |
| node2084 | 1 |
| node2194 | 1 |
| node2124 | 1 |
| node2185 | 1 |
| node2400 | 1 |
| node2103 | 1 |
| node2318 | 1 |
| node2385 | 4 |
| node2296 | 5 |
| node2222 | 8 |
| node2408 | 31 |
| node2028 | 677 |
| node2003 | 1294 |
+----------+------+
15 rows in set (1.30 sec)
Which shows that things went badly on a couple of nodes, but in
general it was good. Looking at the distribution the other way didn't
prove very useful because all 409 nodes put a node in the penalty box
and the distribution went from 1 to 11 events per a node.
I also looked at the rpc timeouts to see which messages and hosts
where having problems:
mysql> select substring_index(host,".",1) as host,substring_index
(message,"(", 1) as msg,count(*) as cnt from Events where
category='org.apache.hadoop.ipc.Server' group by host,msg order by cnt;
+----------+-------------------------+-----+
| host | msg | cnt |
+----------+-------------------------+-----+
| node2194 | Call ping | 1 |
| node2214 | handler output error | 1 |
| node2319 | handler output error | 1 |
| node2013 | handler output error | 1 |
| node2386 | Call ping | 1 |
| node2314 | handler output error | 1 |
| node2091 | handler output error | 1 |
| node2127 | handler output error | 1 |
| node2311 | handler output error | 1 |
| node2226 | Call ping | 1 |
| node2034 | Call ping | 1 |
| node2269 | handler output error | 1 |
| node2003 | Call progress | 2 |
| node2003 | Call getProtocolVersion | 2 |
| node2108 | Call ping | 2 |
| node2028 | Call ping | 5 |
| node2003 | Call ping | 12 |
+----------+-------------------------+-----+
17 rows in set (1.16 sec)
Using mysql to log the important events gave me a convenient way to
look at the logs across the entire cluster and let me get a clearer
view of what the system was having problems with.
I guess after writing all of this down, I should put in on Hadoop's
wiki. *smile*
-- Owen