[jira] [Created] (HAMA-884) Add Combiners and Aggregators API guide to website
Edward J. Yoon created HAMA-884: --- Summary: Add Combiners and Aggregators API guide to website Key: HAMA-884 URL: https://issues.apache.org/jira/browse/HAMA-884 Project: Hama Issue Type: Improvement Components: documentation Reporter: Edward J. Yoon Assignee: Edward J. Yoon -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HAMA-884) Add Combiners and Aggregators API guide to website
[ https://issues.apache.org/jira/browse/HAMA-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-884: Attachment: website.patch attach my patch. Add Combiners and Aggregators API guide to website -- Key: HAMA-884 URL: https://issues.apache.org/jira/browse/HAMA-884 Project: Hama Issue Type: Improvement Components: documentation Reporter: Edward J. Yoon Assignee: Edward J. Yoon Attachments: website.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Release Hama 0.6.4 (RC1)
+1 all test cases pass on my machine. 2014-03-04 1:29 GMT+01:00 Anastasis Andronidis andronat_...@hotmail.com: +1 my small graph programs seams to work rather fine Anastasis On 4 Μαρ 2014, at 1:26 π.μ., Edward J. Yoon edwardy...@apache.org wrote: +1 Signatures are OK and hama cluster works well on my machines. On Mon, Mar 3, 2014 at 7:35 PM, Edward J. Yoon edwardy...@apache.org wrote: Hi all, I've created a RC1 for Hama 0.6.4. This release fixes a lot of bugs, improves memory efficiency (almost x3), and enables DiskVerticesInfo. Artifacts: http://people.apache.org/~edwardyoon/dist/0.6.4-RC1/ Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.4-RC1/ Please test and vote! Thanks. -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
BSP is a bridge model that doesn't restrict itself to some particular usage. My understanding (I could be wrong) is that our framework needs to address such issue. [1], for example, proposes a solution based on bsp in the field of real-time application. [1]. Hartley J.K., Bargiela A., TPML: Parallel meta-language for scientific and engineering computations using transputers (TPML), Proc. of 2nd Int. Conf. on Software for Supercomputers and Multiprocessors, SMS'94, 1994, pp. 22-31 On 4 March 2014 21:20, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
[jira] [Created] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
Renil J created HAMA-885: Summary: Semi-Clustering Algorithm implementation is not producing expected output Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
[ https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920299#comment-13920299 ] Edward J. Yoon commented on HAMA-885: - Let's use the hard-coded input data instead of random input, so that we can easily verify the result. Did you already find the bug? Semi-Clustering Algorithm implementation is not producing expected output - Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
The final goal can be a real-time event processing framework for distributed event detection, filtering, and aggregation. I guess that can be done with only 3 components: * Event processing job configuration interface. * User-defined function that handles the stream input. * Master Aggregator(s) and its client library. I expect this can be applied such as web clickstream log analysis (large scale web servers), finding hot search keywords, detecting system errors in real time, and user will be able to program them in few minutes. On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
I have ever implemented a system monitor/log collector using ActiveMQ and a real time anomaly detection algorithm on top of Twitter's Storm. I think people like me may naturally choose such streaming computing framework to handle this scenario. For real time computation, what is the unique characteristics of Hama that make people choose it instead of Storm? In my humble opinion, one unique characteristic of Hama is that it provides a general BSP computing framework (compared with Giraph, who provide a specific BSP only for graph computing). No one else has such ability. 2014-03-04 21:02 GMT-05:00 Edward J. Yoon edwardy...@apache.org: The final goal can be a real-time event processing framework for distributed event detection, filtering, and aggregation. I guess that can be done with only 3 components: * Event processing job configuration interface. * User-defined function that handles the stream input. * Master Aggregator(s) and its client library. I expect this can be applied such as web clickstream log analysis (large scale web servers), finding hot search keywords, detecting system errors in real time, and user will be able to program them in few minutes. On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
I'm thinking about coupling with ML (incremental) algorithms. On Wed, Mar 5, 2014 at 11:16 AM, Yexi Jiang yexiji...@gmail.com wrote: I have ever implemented a system monitor/log collector using ActiveMQ and a real time anomaly detection algorithm on top of Twitter's Storm. I think people like me may naturally choose such streaming computing framework to handle this scenario. For real time computation, what is the unique characteristics of Hama that make people choose it instead of Storm? In my humble opinion, one unique characteristic of Hama is that it provides a general BSP computing framework (compared with Giraph, who provide a specific BSP only for graph computing). No one else has such ability. 2014-03-04 21:02 GMT-05:00 Edward J. Yoon edwardy...@apache.org: The final goal can be a real-time event processing framework for distributed event detection, filtering, and aggregation. I guess that can be done with only 3 components: * Event processing job configuration interface. * User-defined function that handles the stream input. * Master Aggregator(s) and its client library. I expect this can be applied such as web clickstream log analysis (large scale web servers), finding hot search keywords, detecting system errors in real time, and user will be able to program them in few minutes. On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
[jira] [Commented] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
[ https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920443#comment-13920443 ] Renil J commented on HAMA-885: -- I am attaching a sample data which is a graph with exactly 10 cluster I think this graph file is a valid one. I am also attaching the output of the algorithm run,the output actually should contain 10 clusters with 10 elements each but it not getting correct output. Till now am not able to find where the issue is. Semi-Clustering Algorithm implementation is not producing expected output - Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J Attachments: SemiClusterOutput, SemiClusteringInputGraph.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
[ https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renil J updated HAMA-885: - Attachment: SemiClusterOutput SemiClusteringInputGraph.txt Semi-Clustering Algorithm implementation is not producing expected output - Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J Attachments: SemiClusterOutput, SemiClusteringInputGraph.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
[ https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renil J updated HAMA-885: - Attachment: SemiClusterOutput.txt Semi-Clustering Algorithm implementation is not producing expected output - Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
[ https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renil J updated HAMA-885: - Attachment: (was: SemiClusterOutput) Semi-Clustering Algorithm implementation is not producing expected output - Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
[ https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920443#comment-13920443 ] Renil J edited comment on HAMA-885 at 3/5/14 3:40 AM: -- I am attaching a sample data which is a graph with exactly 10 cluster I think this graph file is a valid one. Also attaching the output of the algorithm run,the output should actually contain 10 clusters with 10 edges each but it not getting correct output. Till now am not able to find where the issue is.Will try for that. was (Author: renil.joseph): I am attaching a sample data which is a graph with exactly 10 cluster I think this graph file is a valid one. I am also attaching the output of the algorithm run,the output actually should contain 10 clusters with 10 elements each but it not getting correct output. Till now am not able to find where the issue is. Semi-Clustering Algorithm implementation is not producing expected output - Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
I used Twitter Storm previously. Storm is an excellent framework in real time processing. Considering Hama in real time tasks, the framework in my opinion need to decouple io from hdfs so that the source/ input is not restricted to just hdfs. On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
Yes, currently Hama does not support streaming input and streaming output. That's why currently it is not a natural choice for people with real time computing needs. Do we really need to make Hama to support the real time computing? In that case, we need to compete with Storm... 2014-03-04 22:58 GMT-05:00 Chia-Hung Lin cli...@googlemail.com: I used Twitter Storm previously. Storm is an excellent framework in real time processing. Considering Hama in real time tasks, the framework in my opinion need to decouple io from hdfs so that the source/ input is not restricted to just hdfs. On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output
[ https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-885: Fix Version/s: 0.7.0 Semi-Clustering Algorithm implementation is not producing expected output - Key: HAMA-885 URL: https://issues.apache.org/jira/browse/HAMA-885 Project: Hama Issue Type: Bug Components: examples, machine learning Reporter: Renil J Fix For: 0.7.0 Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
Below is just my personal viewpoint. We can refactor bsp to be more modularized so that people can choose if that fits their requirement. Basically bsp is a generalized model, it may be good if we can create a flexible framework. On 5 March 2014 12:25, Edward J. Yoon edwardy...@apache.org wrote: Why not? Sent from my iPhone On 2014. 3. 5., at 오후 1:09, Yexi Jiang yexiji...@gmail.com wrote: Yes, currently Hama does not support streaming input and streaming output. That's why currently it is not a natural choice for people with real time computing needs. Do we really need to make Hama to support the real time computing? In that case, we need to compete with Storm... 2014-03-04 22:58 GMT-05:00 Chia-Hung Lin cli...@googlemail.com: I used Twitter Storm previously. Storm is an excellent framework in real time processing. Considering Hama in real time tasks, the framework in my opinion need to decouple io from hdfs so that the source/ input is not restricted to just hdfs. On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/