[ 
https://issues.apache.org/jira/browse/CHUKWA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721248#action_12721248
 ] 

Jiaqi Tan commented on CHUKWA-306:
----------------------------------

You can get away without root access, but say you were on a shared environment, 
e.g. you are taking a class on Hadoop at school and you are using a shared 
server, and everyone logs onto a (handful of, at most) gateway machine to 
submit jobs to a shared cluster, and you wanted to crunch through your logs 
using Chukwa, you wouldn't want to sit around waiting 5 minutes for the 
collector to commit the chunks, or wait for the Demux to get run periodically, 
if you just wanted to look at the logs of one job. Also, more importantly, you 
wouldn't be able to run agents+adaptors on all of the slave nodes, and you 
would need to set up MySQL (wouldn't be easy if you're a not-very-systemsy, 
average user), and do a lot of additional setup steps that the average Hadoop 
user wouldn't want to invest the effort just to look at the analysis of logs 
from one job. 

The aim is to lower the amount of effort needed to use Chukwa for log/metric 
analysis and anomaly detection, visualization, etc.

> Standalone (non-daemon) Chukwa operation
> ----------------------------------------
>
>                 Key: CHUKWA-306
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-306
>             Project: Hadoop Chukwa
>          Issue Type: Wish
>            Reporter: Jiaqi Tan
>            Priority: Critical
>
> This is an articulation of a possible alternative use of Chukwa as a 
> standalone log analysis pipeline. This would enable users to read in existing 
> logs from files, process (Demux) and perform analysis (e.g. current 
> SALSA/Mochi toolchain) on them, and visualize them, without requiring the 
> user to setup or run any daemons, nor database servers. 
> This can be presented as an alternative interface to Chukwa for the user, 
> where the main architectural parts (Chunks, post-Demux SequenceFiles of 
> ChukwaRecords, post-Demux-processing SequenceFiles of ChukwaRecords, and 
> finally time-aggregated database entries for fast visualization) remain 
> unchanged, and Chukwa is manifest as a set of files in HDFS. The main value 
> that Chukwa then provides to users is 1. centralized one-stop-shop for log 
> processing+analysis+anomaly detection, 2. the ability to use MapReduce to 
> process logs, regardless of whether they had used Chukwa to collect the logs. 
> That way, the ability to process logs and analyze/do diagnosis is not tied to 
> having to run the entire Chukwa daemon infrastructure, since many users who 
> use Hadoop clusters may not have superuser access to those machines, e.g. 
> users at universities using shared clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to