[
https://issues.apache.org/jira/browse/CHUKWA-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857286#comment-13857286
]
michael yu commented on CHUKWA-680:
-----------------------------------
Hi Otis,
I may have no included a screenshot of the accuracy. You can reference Chapter
6 Performance and Benchmarks. From all of my testing for my provided data set,
I recall the accuracy being anywhere between 95% to 100%.
In general, the larger the data set you feed to SVM, the better (and more
accurate) the training model.
Unfortunately, the code was implemented in such a way specific to querying and
parsing the metrics data from HBase in a Hadoop environment. The code can (and
should) be refactored and generalized to process metrics from different
datasource types.
> Pattern recognition of Hadoop generated metrics
> -----------------------------------------------
>
> Key: CHUKWA-680
> URL: https://issues.apache.org/jira/browse/CHUKWA-680
> Project: Chukwa
> Issue Type: New Feature
> Components: Data Collection
> Environment: IBM InfoSphere BigInsights Enterprise
> Reporter: michael yu
> Assignee: michael yu
> Priority: Minor
> Labels: GSoC, GSoC2013
> Attachments: Yu, Michael et al-project-report-draft.pdf
>
> Original Estimate: 2,760h
> Remaining Estimate: 2,760h
>
> Charles Lin and I are working on our IBM SJSU masters project on "Pattern
> recognition of Hadoop generated metrics".
> The purpose of the project is to use libsvm to predict the health of the
> cluster.
> The scope of the project includes:
> 1) gathering large scale data set of metrics for healthy and unhealthy
> clusters
> 2) use #1 and libsvm to generate training model
> 3) periodic collection of metrics and comparing against training model using
> libsvm to predict the cluster health
> a) if unhealthy, send email notification to system administrator
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)