[jira] [Created] (HAMA-884) Add Combiners and Aggregators API guide to website

2014-03-04 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-884:
---

 Summary: Add Combiners and Aggregators API guide to website
 Key: HAMA-884
 URL: https://issues.apache.org/jira/browse/HAMA-884
 Project: Hama
  Issue Type: Improvement
  Components: documentation 
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HAMA-884) Add Combiners and Aggregators API guide to website

2014-03-04 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-884:


Attachment: website.patch

attach my patch.

 Add Combiners and Aggregators API guide to website
 --

 Key: HAMA-884
 URL: https://issues.apache.org/jira/browse/HAMA-884
 Project: Hama
  Issue Type: Improvement
  Components: documentation 
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Attachments: website.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Release Hama 0.6.4 (RC1)

2014-03-04 Thread Martin Illecker
+1 all test cases pass on my machine.


2014-03-04 1:29 GMT+01:00 Anastasis Andronidis andronat_...@hotmail.com:

 +1 my small graph programs seams to work rather fine

 Anastasis

 On 4 Μαρ 2014, at 1:26 π.μ., Edward J. Yoon edwardy...@apache.org wrote:

  +1
 
  Signatures are OK and hama cluster works well on my machines.
 
  On Mon, Mar 3, 2014 at 7:35 PM, Edward J. Yoon edwardy...@apache.org
 wrote:
  Hi all,
 
  I've created a RC1 for Hama 0.6.4. This release fixes a lot of bugs,
  improves memory efficiency (almost x3), and enables DiskVerticesInfo.
 
  Artifacts: http://people.apache.org/~edwardyoon/dist/0.6.4-RC1/
 
  Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.4-RC1/
 
  Please test and vote!
 
  Thanks.
 
  --
  Edward J. Yoon (@eddieyoon)
  Chief Executive Officer
  DataSayer, Inc.
 
 
 
  --
  Edward J. Yoon (@eddieyoon)
  Chief Executive Officer
  DataSayer, Inc.
 




Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Yexi Jiang
I am very interested in this topic since my research area includes event
mining, but can BSP conducts the real time computing?

I once used the message queue based solution to collect the event logs.


2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:


  [
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Edward J. Yoon updated HAMA-883:
 

 Summary: [Research Task] Massive log event aggregation in real time
 using Apache Hama  (was: [Research Task] Massive log data aggregation in
 real time using Apache Hama)

  [Research Task] Massive log event aggregation in real time using Apache
 Hama
 
 
 
  Key: HAMA-883
  URL: https://issues.apache.org/jira/browse/HAMA-883
  Project: Hama
   Issue Type: Task
 Reporter: Edward J. Yoon
 
  BSP tasks can be used for aggregating log data streamed in real time.
 With this research task, we might able to platformization these kind of
 processing.



 --
 This message was sent by Atlassian JIRA
 (v6.2#6252)




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Chia-Hung Lin
BSP is a bridge model that doesn't restrict itself to some particular
usage. My understanding (I could be wrong) is that our framework needs
to address such issue. [1], for example, proposes a solution based on
bsp in the field of real-time application.

[1]. Hartley J.K., Bargiela A., TPML: Parallel meta-language for
scientific and engineering computations using transputers (TPML),
Proc. of 2nd Int. Conf. on Software for Supercomputers and
Multiprocessors, SMS'94, 1994, pp. 22-31




On 4 March 2014 21:20, Yexi Jiang yexiji...@gmail.com wrote:
 I am very interested in this topic since my research area includes event
 mining, but can BSP conducts the real time computing?

 I once used the message queue based solution to collect the event logs.


 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:


  [
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Edward J. Yoon updated HAMA-883:
 

 Summary: [Research Task] Massive log event aggregation in real time
 using Apache Hama  (was: [Research Task] Massive log data aggregation in
 real time using Apache Hama)

  [Research Task] Massive log event aggregation in real time using Apache
 Hama
 
 
 
  Key: HAMA-883
  URL: https://issues.apache.org/jira/browse/HAMA-883
  Project: Hama
   Issue Type: Task
 Reporter: Edward J. Yoon
 
  BSP tasks can be used for aggregating log data streamed in real time.
 With this research task, we might able to platformization these kind of
 processing.



 --
 This message was sent by Atlassian JIRA
 (v6.2#6252)




 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/


[jira] [Created] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Renil J (JIRA)
Renil J created HAMA-885:


 Summary: Semi-Clustering Algorithm implementation is not producing 
expected output
 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920299#comment-13920299
 ] 

Edward J. Yoon commented on HAMA-885:
-

Let's use the hard-coded input data instead of random input, so that we can 
easily verify the result. Did you already find the bug?

 Semi-Clustering Algorithm implementation is not producing expected output
 -

 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J





--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Yexi Jiang
Please correct me if I'm wrong. My understanding of aggregating the log is
the collect the generated from each monitored machine in real time. The
collecting procedure is continuous like a data stream and never end.

I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate
the logs incrementally each day), but I cannot immediately make up an idea
of using Hama to solve this problem in real time approach.


2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

 Aggregators of Graph package are doing similar wok. Monitoring and
 Global communication, ..., etc.



 On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote:
  I am very interested in this topic since my research area includes event
  mining, but can BSP conducts the real time computing?
 
  I once used the message queue based solution to collect the event logs.
 
 
  2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:
 
 
   [
 
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]
 
  Edward J. Yoon updated HAMA-883:
  
 
  Summary: [Research Task] Massive log event aggregation in real time
  using Apache Hama  (was: [Research Task] Massive log data aggregation in
  real time using Apache Hama)
 
   [Research Task] Massive log event aggregation in real time using
 Apache
  Hama
  
 
 
  
   Key: HAMA-883
   URL: https://issues.apache.org/jira/browse/HAMA-883
   Project: Hama
Issue Type: Task
  Reporter: Edward J. Yoon
  
   BSP tasks can be used for aggregating log data streamed in real time.
  With this research task, we might able to platformization these kind of
  processing.
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.2#6252)
 
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Edward J. Yoon
The final goal can be a real-time event processing framework for
distributed event detection, filtering, and aggregation. I guess that
can be done with only 3 components:

 * Event processing job configuration interface.
 * User-defined function that handles the stream input.
 * Master Aggregator(s) and its client library.

I expect this can be applied such as web clickstream log analysis
(large scale web servers), finding hot search keywords, detecting
system errors in real time, and user will be able to program them in
few minutes.


On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang yexiji...@gmail.com wrote:
 Please correct me if I'm wrong. My understanding of aggregating the log is
 the collect the generated from each monitored machine in real time. The
 collecting procedure is continuous like a data stream and never end.

 I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate
 the logs incrementally each day), but I cannot immediately make up an idea
 of using Hama to solve this problem in real time approach.


 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

 Aggregators of Graph package are doing similar wok. Monitoring and
 Global communication, ..., etc.



 On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote:
  I am very interested in this topic since my research area includes event
  mining, but can BSP conducts the real time computing?
 
  I once used the message queue based solution to collect the event logs.
 
 
  2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:
 
 
   [
 
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]
 
  Edward J. Yoon updated HAMA-883:
  
 
  Summary: [Research Task] Massive log event aggregation in real time
  using Apache Hama  (was: [Research Task] Massive log data aggregation in
  real time using Apache Hama)
 
   [Research Task] Massive log event aggregation in real time using
 Apache
  Hama
  
 
 
  
   Key: HAMA-883
   URL: https://issues.apache.org/jira/browse/HAMA-883
   Project: Hama
Issue Type: Task
  Reporter: Edward J. Yoon
  
   BSP tasks can be used for aggregating log data streamed in real time.
  With this research task, we might able to platformization these kind of
  processing.
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.2#6252)
 
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.




 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Yexi Jiang
I have ever implemented a system monitor/log collector using ActiveMQ and a
real time anomaly detection algorithm on top of Twitter's Storm. I think
people like me may naturally choose such streaming computing framework to
handle this scenario.

For real time computation, what is the unique characteristics of Hama that
make people choose it instead of Storm? In my humble opinion, one unique
characteristic of Hama is that it provides a general BSP computing
framework (compared with Giraph, who provide a specific BSP only for graph
computing). No one else has such ability.


2014-03-04 21:02 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

 The final goal can be a real-time event processing framework for
 distributed event detection, filtering, and aggregation. I guess that
 can be done with only 3 components:

  * Event processing job configuration interface.
  * User-defined function that handles the stream input.
  * Master Aggregator(s) and its client library.

 I expect this can be applied such as web clickstream log analysis
 (large scale web servers), finding hot search keywords, detecting
 system errors in real time, and user will be able to program them in
 few minutes.


 On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang yexiji...@gmail.com wrote:
  Please correct me if I'm wrong. My understanding of aggregating the log
 is
  the collect the generated from each monitored machine in real time. The
  collecting procedure is continuous like a data stream and never end.
 
  I know how to use Hama to aggregate the logs batch by batch (e.g.
 aggregate
  the logs incrementally each day), but I cannot immediately make up an
 idea
  of using Hama to solve this problem in real time approach.
 
 
  2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:
 
  Aggregators of Graph package are doing similar wok. Monitoring and
  Global communication, ..., etc.
 
 
 
  On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com
 wrote:
   I am very interested in this topic since my research area includes
 event
   mining, but can BSP conducts the real time computing?
  
   I once used the message queue based solution to collect the event
 logs.
  
  
   2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:
  
  
[
  
 
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
  
   Edward J. Yoon updated HAMA-883:
   
  
   Summary: [Research Task] Massive log event aggregation in real
 time
   using Apache Hama  (was: [Research Task] Massive log data
 aggregation in
   real time using Apache Hama)
  
[Research Task] Massive log event aggregation in real time using
  Apache
   Hama
   
  
 
 
   
Key: HAMA-883
URL:
 https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
 Issue Type: Task
   Reporter: Edward J. Yoon
   
BSP tasks can be used for aggregating log data streamed in real
 time.
   With this research task, we might able to platformization these kind
 of
   processing.
  
  
  
   --
   This message was sent by Atlassian JIRA
   (v6.2#6252)
  
  
  
  
   --
   --
   Yexi Jiang,
   ECS 251,  yjian...@cs.fiu.edu
   School of Computer and Information Science,
   Florida International University
   Homepage: http://users.cis.fiu.edu/~yjian004/
 
 
 
  --
  Edward J. Yoon (@eddieyoon)
  Chief Executive Officer
  DataSayer, Inc.
 
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Edward J. Yoon
I'm thinking about coupling with ML (incremental) algorithms.

On Wed, Mar 5, 2014 at 11:16 AM, Yexi Jiang yexiji...@gmail.com wrote:
 I have ever implemented a system monitor/log collector using ActiveMQ and a
 real time anomaly detection algorithm on top of Twitter's Storm. I think
 people like me may naturally choose such streaming computing framework to
 handle this scenario.

 For real time computation, what is the unique characteristics of Hama that
 make people choose it instead of Storm? In my humble opinion, one unique
 characteristic of Hama is that it provides a general BSP computing
 framework (compared with Giraph, who provide a specific BSP only for graph
 computing). No one else has such ability.


 2014-03-04 21:02 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

 The final goal can be a real-time event processing framework for
 distributed event detection, filtering, and aggregation. I guess that
 can be done with only 3 components:

  * Event processing job configuration interface.
  * User-defined function that handles the stream input.
  * Master Aggregator(s) and its client library.

 I expect this can be applied such as web clickstream log analysis
 (large scale web servers), finding hot search keywords, detecting
 system errors in real time, and user will be able to program them in
 few minutes.


 On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang yexiji...@gmail.com wrote:
  Please correct me if I'm wrong. My understanding of aggregating the log
 is
  the collect the generated from each monitored machine in real time. The
  collecting procedure is continuous like a data stream and never end.
 
  I know how to use Hama to aggregate the logs batch by batch (e.g.
 aggregate
  the logs incrementally each day), but I cannot immediately make up an
 idea
  of using Hama to solve this problem in real time approach.
 
 
  2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:
 
  Aggregators of Graph package are doing similar wok. Monitoring and
  Global communication, ..., etc.
 
 
 
  On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com
 wrote:
   I am very interested in this topic since my research area includes
 event
   mining, but can BSP conducts the real time computing?
  
   I once used the message queue based solution to collect the event
 logs.
  
  
   2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:
  
  
[
  
 
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
  
   Edward J. Yoon updated HAMA-883:
   
  
   Summary: [Research Task] Massive log event aggregation in real
 time
   using Apache Hama  (was: [Research Task] Massive log data
 aggregation in
   real time using Apache Hama)
  
[Research Task] Massive log event aggregation in real time using
  Apache
   Hama
   
  
 
 
   
Key: HAMA-883
URL:
 https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
 Issue Type: Task
   Reporter: Edward J. Yoon
   
BSP tasks can be used for aggregating log data streamed in real
 time.
   With this research task, we might able to platformization these kind
 of
   processing.
  
  
  
   --
   This message was sent by Atlassian JIRA
   (v6.2#6252)
  
  
  
  
   --
   --
   Yexi Jiang,
   ECS 251,  yjian...@cs.fiu.edu
   School of Computer and Information Science,
   Florida International University
   Homepage: http://users.cis.fiu.edu/~yjian004/
 
 
 
  --
  Edward J. Yoon (@eddieyoon)
  Chief Executive Officer
  DataSayer, Inc.
 
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.




 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.


[jira] [Commented] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Renil J (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920443#comment-13920443
 ] 

Renil J commented on HAMA-885:
--

I am attaching a sample data which is a graph with exactly 10 cluster I think 
this graph file is a valid one.
I am also attaching the output of the algorithm run,the output actually should 
contain 10 clusters with 10 elements each but it not getting correct output.
Till now am not able to find where the issue is.

 Semi-Clustering Algorithm implementation is not producing expected output
 -

 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J
 Attachments: SemiClusterOutput, SemiClusteringInputGraph.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Renil J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renil J updated HAMA-885:
-

Attachment: SemiClusterOutput
SemiClusteringInputGraph.txt

 Semi-Clustering Algorithm implementation is not producing expected output
 -

 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J
 Attachments: SemiClusterOutput, SemiClusteringInputGraph.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Renil J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renil J updated HAMA-885:
-

Attachment: SemiClusterOutput.txt

 Semi-Clustering Algorithm implementation is not producing expected output
 -

 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J
 Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Renil J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renil J updated HAMA-885:
-

Attachment: (was: SemiClusterOutput)

 Semi-Clustering Algorithm implementation is not producing expected output
 -

 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J
 Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Renil J (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920443#comment-13920443
 ] 

Renil J edited comment on HAMA-885 at 3/5/14 3:40 AM:
--

I am attaching a sample data which is a graph with exactly 10 cluster I think 
this graph file is a valid one.
Also attaching the output of the algorithm run,the output  should actually 
contain 10 clusters with 10 edges each but it not getting correct output.
Till now am not able to find where the issue is.Will try for that.


was (Author: renil.joseph):
I am attaching a sample data which is a graph with exactly 10 cluster I think 
this graph file is a valid one.
I am also attaching the output of the algorithm run,the output actually should 
contain 10 clusters with 10 elements each but it not getting correct output.
Till now am not able to find where the issue is.

 Semi-Clustering Algorithm implementation is not producing expected output
 -

 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J
 Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Chia-Hung Lin
I used Twitter Storm previously. Storm is an excellent framework in
real time processing.

Considering Hama in real time tasks, the framework in my opinion need
to decouple io from hdfs so that the source/ input is not restricted
to just hdfs.

On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote:
 Please correct me if I'm wrong. My understanding of aggregating the log is
 the collect the generated from each monitored machine in real time. The
 collecting procedure is continuous like a data stream and never end.

 I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate
 the logs incrementally each day), but I cannot immediately make up an idea
 of using Hama to solve this problem in real time approach.


 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

 Aggregators of Graph package are doing similar wok. Monitoring and
 Global communication, ..., etc.



 On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote:
  I am very interested in this topic since my research area includes event
  mining, but can BSP conducts the real time computing?
 
  I once used the message queue based solution to collect the event logs.
 
 
  2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:
 
 
   [
 
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]
 
  Edward J. Yoon updated HAMA-883:
  
 
  Summary: [Research Task] Massive log event aggregation in real time
  using Apache Hama  (was: [Research Task] Massive log data aggregation in
  real time using Apache Hama)
 
   [Research Task] Massive log event aggregation in real time using
 Apache
  Hama
  
 
 
  
   Key: HAMA-883
   URL: https://issues.apache.org/jira/browse/HAMA-883
   Project: Hama
Issue Type: Task
  Reporter: Edward J. Yoon
  
   BSP tasks can be used for aggregating log data streamed in real time.
  With this research task, we might able to platformization these kind of
  processing.
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.2#6252)
 
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.




 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Yexi Jiang
Yes, currently Hama does not support streaming input and streaming output.
 That's why currently it is not a natural choice for people with real time
computing needs.

Do we really need to make Hama to support the real time computing? In that
case, we need to compete with Storm...


2014-03-04 22:58 GMT-05:00 Chia-Hung Lin cli...@googlemail.com:

 I used Twitter Storm previously. Storm is an excellent framework in
 real time processing.

 Considering Hama in real time tasks, the framework in my opinion need
 to decouple io from hdfs so that the source/ input is not restricted
 to just hdfs.

 On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote:
  Please correct me if I'm wrong. My understanding of aggregating the log
 is
  the collect the generated from each monitored machine in real time. The
  collecting procedure is continuous like a data stream and never end.
 
  I know how to use Hama to aggregate the logs batch by batch (e.g.
 aggregate
  the logs incrementally each day), but I cannot immediately make up an
 idea
  of using Hama to solve this problem in real time approach.
 
 
  2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:
 
  Aggregators of Graph package are doing similar wok. Monitoring and
  Global communication, ..., etc.
 
 
 
  On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com
 wrote:
   I am very interested in this topic since my research area includes
 event
   mining, but can BSP conducts the real time computing?
  
   I once used the message queue based solution to collect the event
 logs.
  
  
   2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:
  
  
[
  
 
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
  
   Edward J. Yoon updated HAMA-883:
   
  
   Summary: [Research Task] Massive log event aggregation in real
 time
   using Apache Hama  (was: [Research Task] Massive log data
 aggregation in
   real time using Apache Hama)
  
[Research Task] Massive log event aggregation in real time using
  Apache
   Hama
   
  
 
 
   
Key: HAMA-883
URL:
 https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
 Issue Type: Task
   Reporter: Edward J. Yoon
   
BSP tasks can be used for aggregating log data streamed in real
 time.
   With this research task, we might able to platformization these kind
 of
   processing.
  
  
  
   --
   This message was sent by Atlassian JIRA
   (v6.2#6252)
  
  
  
  
   --
   --
   Yexi Jiang,
   ECS 251,  yjian...@cs.fiu.edu
   School of Computer and Information Science,
   Florida International University
   Homepage: http://users.cis.fiu.edu/~yjian004/
 
 
 
  --
  Edward J. Yoon (@eddieyoon)
  Chief Executive Officer
  DataSayer, Inc.
 
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


[jira] [Updated] (HAMA-885) Semi-Clustering Algorithm implementation is not producing expected output

2014-03-04 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-885:


Fix Version/s: 0.7.0

 Semi-Clustering Algorithm implementation is not producing expected output
 -

 Key: HAMA-885
 URL: https://issues.apache.org/jira/browse/HAMA-885
 Project: Hama
  Issue Type: Bug
  Components: examples, machine learning
Reporter: Renil J
 Fix For: 0.7.0

 Attachments: SemiClusterOutput.txt, SemiClusteringInputGraph.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Chia-Hung Lin
Below is just my personal viewpoint. We can refactor bsp to be more
modularized so that people can choose if that fits their requirement.
Basically bsp is a generalized model, it may be good if we can create
a flexible framework.



On 5 March 2014 12:25, Edward J. Yoon edwardy...@apache.org wrote:
 Why not?

 Sent from my iPhone

 On 2014. 3. 5., at 오후 1:09, Yexi Jiang yexiji...@gmail.com wrote:

 Yes, currently Hama does not support streaming input and streaming output.
 That's why currently it is not a natural choice for people with real time
 computing needs.

 Do we really need to make Hama to support the real time computing? In that
 case, we need to compete with Storm...


 2014-03-04 22:58 GMT-05:00 Chia-Hung Lin cli...@googlemail.com:

 I used Twitter Storm previously. Storm is an excellent framework in
 real time processing.

 Considering Hama in real time tasks, the framework in my opinion need
 to decouple io from hdfs so that the source/ input is not restricted
 to just hdfs.

 On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote:
 Please correct me if I'm wrong. My understanding of aggregating the log
 is
 the collect the generated from each monitored machine in real time. The
 collecting procedure is continuous like a data stream and never end.

 I know how to use Hama to aggregate the logs batch by batch (e.g.
 aggregate
 the logs incrementally each day), but I cannot immediately make up an
 idea
 of using Hama to solve this problem in real time approach.


 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

 Aggregators of Graph package are doing similar wok. Monitoring and
 Global communication, ..., etc.



 On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com
 wrote:
 I am very interested in this topic since my research area includes
 event
 mining, but can BSP conducts the real time computing?

 I once used the message queue based solution to collect the event
 logs.


 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:


 [
 https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

 Edward J. Yoon updated HAMA-883:
 

Summary: [Research Task] Massive log event aggregation in real
 time
 using Apache Hama  (was: [Research Task] Massive log data
 aggregation in
 real time using Apache Hama)

 [Research Task] Massive log event aggregation in real time using
 Apache
 Hama
 

Key: HAMA-883
URL:
 https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
 Issue Type: Task
   Reporter: Edward J. Yoon

 BSP tasks can be used for aggregating log data streamed in real
 time.
 With this research task, we might able to platformization these kind
 of
 processing.



 --
 This message was sent by Atlassian JIRA
 (v6.2#6252)



 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.



 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/