Using piped logs can be more efficient, but is riskier because Flume can deliver messages without saving on disk. Doing this, however, increases the probability of event loss. From a security point of view, this Flume node instance runs as Apache’s user which is often root according to the Apache manual.
from: http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ It seems to me that it is the REVERSE: 1. piped logs is less efficient - more server operations for piping to flume, while tailing is truly asynchronous and web server has zero works on flume 2. piped logs has less chance of event loss, since piped log should be processed before writing to the log file, so if an event can be written into the log file, it should always exist in flum, but not the vice versa. Any comment?
