from:"Chia\-Hung Lin"

Wiki migration

2019-04-26 Thread Chia-Hung Lin

Old wiki is migrated to Confluence[1]. Given a quick check, that looks
ok. But please let me know if anything in terms of this.

[1]. https://cwiki.apache.org/HAMA

Kanban

2019-03-24 Thread Chia-Hung Lin

Can't get jira's kanban working, so use [1] temporary.

[1]. https://trello.com/b/KRPf8xqS

Re: [DISCUSS] Roadmap

2019-03-17 Thread Chia-Hung Lin

I like the first idea. Was thinking of it as well so when refactoring,
trying to further splitting the master, and groom by assigning the
role. Would you mind to share a bit more detail on both items?

On Fri, 15 Mar 2019 at 09:24, ByungSeok Min  wrote:
>
> Hello everyone.
>
> How about the item below
> 1. Distribued Processing using blockchain
> 2. Add Tensorflow.
>
> Have a nice day^^
>
> 2019년 3월 3일 일요일, Chia-Hung Lin 님이 작성:
>
> > In addition to what's been working on right now. What other tasks you would
> > like it to be added? Please share your thought.
> >

[DISCUSS] Roadmap

2019-03-03 Thread Chia-Hung Lin

In addition to what's been working on right now. What other tasks you would
like it to be added? Please share your thought.

Re: Project healthy question

2019-02-28 Thread Chia-Hung Lin

Sounds good! Although I am slowly refactoring some tasks at my leisure
time, it's nice to have other tasks that we can work on. Would you please
help start in another thread for discussion or I will start a new one later
on? Thanks!

On Thu, 28 Feb 2019 at 09:28, ByungSeok Min 
wrote:

> I keep watching hama.
>
> Although I had things I wanted to develop,i have not been able to
> participate actively.
> I am going to work AI/Big Data in my company from March.
> I will try to do again.
>
> How about setting up roadmap together?
> I think it would be nice to start the 2019 roadmap first
>
> 2019년 2월 26일 (화) 오후 10:25, Edward J. Yoon 님이 작성:
>
> > Obviously inactive :/ By the way, I personally seeing that many of apache
> > projects are going into inactive state. especially big data related
> > projects.
> >
> >
> > 2019년 2월 25일 (월) 오전 12:19, Chia-Hung Lin  >님이
> > 작성:
> >
> > > As you may notice that the activity is low for a period of time. This
> > > raises an issue when doing board project with respect to the project
> > > healthy. So it's greatly appreciated if anyone has inputs or comments
> > > regarding to this.
> > >
> >
>

Re: Project healthy question

2019-02-26 Thread Chia-Hung Lin

This is off topic, but maybe an interesting discussion to read[1]. It looks
like cloud provider has impact on big data software. Not sure if this in
turn has affected developer's contribution to related projects or not. In
addition to that, personally I find it's difficult to find the balance,
even attempting to glean from fractional time available.

[1]. https://news.ycombinator.com/item?id=18869755

On Tue, 26 Feb 2019 at 13:25, Edward J. Yoon  wrote:

> Obviously inactive :/ By the way, I personally seeing that many of apache
> projects are going into inactive state. especially big data related
> projects.
>
>
> 2019년 2월 25일 (월) 오전 12:19, Chia-Hung Lin 님이
> 작성:
>
> > As you may notice that the activity is low for a period of time. This
> > raises an issue when doing board project with respect to the project
> > healthy. So it's greatly appreciated if anyone has inputs or comments
> > regarding to this.
> >
>

Project healthy question

2019-02-24 Thread Chia-Hung Lin

As you may notice that the activity is low for a period of time. This
raises an issue when doing board project with respect to the project
healthy. So it's greatly appreciated if anyone has inputs or comments
regarding to this.

Apache Hama git repos migration

2019-02-16 Thread Chia-Hung Lin

According to [1], Apache Hama git repos was migrated to gitbox [2].

[1]. https://issues.apache.org/jira/browse/INFRA-17788
[2]. https://gitbox.apache.org/repos/asf/hama.git

Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!

2019-01-30 Thread Chia-Hung Lin

+1

On Wed, 30 Jan 2019 at 12:47, Júlio Pires  wrote:

> I'm +1 too.
>
> Em qua, 30 de jan de 2019 às 06:35, Edward J. Yoon 
> escreveu:
>
> > P.S., Please vote on here. We need to make a consensus.
> >
> > I'm +1.
> >
> > On Wed, Jan 30, 2019 at 5:33 PM Edward J. Yoon 
> > wrote:
> > >
> > > Hi devs,
> > > I propose we ask ASF Infra to move the Hama Git repo to GitBox as soon
> as
> > > the release has been finalized / announced. Once they switch things
> over,
> > > we can update the web site / documentation to reflect that.
> > >
> > > Does anyone see any problems with this approach?
> > > If there's no objections, I'll create a jira ticket.
> > >
> > > Thanks.
> > >
> > > On Thu, Jan 17, 2019 at 4:20 AM Chia-Hung Lin
> > >  wrote:
> > > >
> > > > Hi Edward, thanks for help!
> > > >
> > > > On Tue, 15 Jan 2019 at 13:10, Edward J. Yoon 
> > wrote:
> > > >
> > > > >  I can check tomorrow!
> > > > >
> > > > > 2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team <
> > > > > infrastruct...@apache.org>님이 작성:
> > > > >
> > > > > > Hello, hama folks.
> > > > > > As stated earlier in 2018, and reiterated two weeks ago, all git
> > > > > > repositories must be migrated from the git-wip-us.apache.org URL
> > to
> > > > > > gitbox.apache.org, as the old service is being decommissioned.
> > Your
> > > > > > project is receiving this email because you still have
> > repositories on
> > > > > > git-wip-us that needs to be migrated.
> > > > > >
> > > > > > The following repositories on git-wip-us belong to your project:
> > > > > >  - hama.git
> > > > > >
> > > > > >
> > > > > > We are now entering the remaining three weeks of the mandated
> > > > > > (coordinated) move stage of the roadmap, and you are asked to
> > please
> > > > > > coordinate migration with the Apache Infrastructure Team before
> > February
> > > > > > 7th. All repositories not migrated on February 7th will be mass
> > migrated
> > > > > > without warning, and we'd appreciate it if we could work together
> > to
> > > > > > avoid a big mess that day :-).
> > > > > >
> > > > > > As stated earlier, moving to gitbox means you will get full write
> > access
> > > > > > on GitHub as well, and be able to close/merge pull requests and
> > much
> > > > > > more. The move is mandatory for all Apache projects using git.
> > > > > >
> > > > > > To have your repositories moved, please follow these steps:
> > > > > >
> > > > > > - Ensure consensus on the move (a link to a lists.apache.org
> > thread will
> > > > > >   suffice for us as evidence).
> > > > > > - Create a JIRA ticket at
> > https://issues.apache.org/jira/browse/INFRA
> > > > > >
> > > > > > Your migration should only take a few minutes. If you wish to
> > migrate
> > > > > > at a specific time of day or date, please do let us know in the
> > ticket,
> > > > > > otherwise we will migrate at the earliest convenient time.
> > > > > >
> > > > > > There will be redirects in place from git-wip to gitbox, so
> > requests
> > > > > > using the old remote origins should still work (however we
> > encourage
> > > > > > people to update their remotes once migration has completed).
> > > > > >
> > > > > > As always, we appreciate your understanding and patience as we
> move
> > > > > > things around and work to provide better services and features
> for
> > > > > > the Apache Family.
> > > > > >
> > > > > > Should you wish to contact us with feedback or questions, please
> > do so
> > > > > > at: us...@infra.apache.org.
> > > > > >
> > > > > >
> > > > > > With regards,
> > > > > > Apache Infrastructure
> > > > > >
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> >
>

Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!

2019-01-16 Thread Chia-Hung Lin

Hi Edward, thanks for help!

On Tue, 15 Jan 2019 at 13:10, Edward J. Yoon  wrote:

>  I can check tomorrow!
>
> 2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team <
> infrastruct...@apache.org>님이 작성:
>
> > Hello, hama folks.
> > As stated earlier in 2018, and reiterated two weeks ago, all git
> > repositories must be migrated from the git-wip-us.apache.org URL to
> > gitbox.apache.org, as the old service is being decommissioned. Your
> > project is receiving this email because you still have repositories on
> > git-wip-us that needs to be migrated.
> >
> > The following repositories on git-wip-us belong to your project:
> >  - hama.git
> >
> >
> > We are now entering the remaining three weeks of the mandated
> > (coordinated) move stage of the roadmap, and you are asked to please
> > coordinate migration with the Apache Infrastructure Team before February
> > 7th. All repositories not migrated on February 7th will be mass migrated
> > without warning, and we'd appreciate it if we could work together to
> > avoid a big mess that day :-).
> >
> > As stated earlier, moving to gitbox means you will get full write access
> > on GitHub as well, and be able to close/merge pull requests and much
> > more. The move is mandatory for all Apache projects using git.
> >
> > To have your repositories moved, please follow these steps:
> >
> > - Ensure consensus on the move (a link to a lists.apache.org thread will
> >   suffice for us as evidence).
> > - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
> >
> > Your migration should only take a few minutes. If you wish to migrate
> > at a specific time of day or date, please do let us know in the ticket,
> > otherwise we will migrate at the earliest convenient time.
> >
> > There will be redirects in place from git-wip to gitbox, so requests
> > using the old remote origins should still work (however we encourage
> > people to update their remotes once migration has completed).
> >
> > As always, we appreciate your understanding and patience as we move
> > things around and work to provide better services and features for
> > the Apache Family.
> >
> > Should you wish to contact us with feedback or questions, please do so
> > at: us...@infra.apache.org.
> >
> >
> > With regards,
> > Apache Infrastructure
> >
> >
>

Re: [jira] [Commented] (HAMA-1002) Add junit dependency to commons to compile with Hadoop 2.8+

2017-12-29 Thread Chia-Hung Lin

When applying patch Jenkins complains fail integrating patch. Checking
TestResult[1], it looks like other tests are waiting master to be up
running for use. Anyone knows if there's any way we can re-run the
build?

[1]. https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/731/testReport/

On 30 December 2017 at 07:39, Hudson (JIRA)  wrote:
>
> [ 
> https://issues.apache.org/jira/browse/HAMA-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306672#comment-16306672
>  ]
>
> Hudson commented on HAMA-1002:
> --
>
> FAILURE: Integrated in Jenkins build Hama-Nightly-for-Hadoop-2.x #731 (See 
> [https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/731/])
> HAMA-1002: Add junit dependency to commons to compile with Hadoop 2.8+ 
> (ywkim: rev fe57a39cc5a576d11d9137fbae156a79d4d7a8a1)
> * (edit) commons/pom.xml
>
>
>> Add junit dependency to commons to compile with Hadoop 2.8+
>> ---
>>
>> Key: HAMA-1002
>> URL: https://issues.apache.org/jira/browse/HAMA-1002
>> Project: Hama
>>  Issue Type: Bug
>>  Components: build
>>Affects Versions: 0.7.1
>>Reporter: YoungWoo Kim
>> Fix For: 0.7.2
>>
>> Attachments: HAMA-1002.0.patch
>>
>>
>> Compilation with Hadoop 2.8+ does not work because transitive dependencies 
>> for Hadoop have been changed:
>> {noformat}
>> $ mvn clean package -Phadoop2 -Dhadoop.version=2.8.1 -DskipTests
>> (snip)
>> [INFO] Apache Hama parent POM . SUCCESS [ 23.797 
>> s]
>> [INFO] pipes .. SUCCESS [ 22.680 
>> s]
>> [INFO] commons  FAILURE [  6.662 
>> s]
>> [INFO] core ... SKIPPED
>> [INFO] graph .. SKIPPED
>> [INFO] machine learning ... SKIPPED
>> [INFO] examples ... SKIPPED
>> [INFO] mesos .. SKIPPED
>> [INFO] yarn ... SKIPPED
>> [INFO] hama-dist .. SKIPPED
>> [INFO] 
>> 
>> [INFO] BUILD FAILURE
>> [INFO] 
>> 
>> [INFO] Total time: 53.544 s
>> [INFO] Finished at: 2017-12-26T14:55:24+09:00
>> [INFO] Final Memory: 62M/568M
>> [INFO] 
>> 
>> [ERROR] Failed to execute goal 
>> org.apache.maven.plugins:maven-compiler-plugin:2.3.2:testCompile 
>> (default-testCompile) on project hama-commons: Compilation failure: 
>> Compilation failure:
>> [ERROR] 
>> /Users/ywkim/workspace/hama/commons/src/test/java/org/apache/hama/commons/math/TestDenseDoubleVector.java:[20,23]
>>  error: package org.junit does not exist
>> [ERROR] 
>> /Users/ywkim/workspace/hama/commons/src/test/java/org/apache/hama/commons/math/TestDenseDoubleVector.java:[20,0]
>>  error: static import only from classes and interfaces
>> [ERROR] 
>> /Users/ywkim/workspace/hama/commons/src/test/java/org/apache/hama/commons/math/TestDenseDoubleVector.java:[21,23]
>>  error: package org.junit does not exist
>> (snip)
>> {noformat}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)

Re: [jira] [Created] (HAMA-1002) Add junit dependency to commons to compile with Hadoop 2.8+

2017-12-28 Thread Chia-Hung Lin

I'll look into this, but it may be slow. So feel free to pick this up if
the progress appears to be stalled.

On Thursday, 28 December 2017, Edward J. Yoon  wrote:
> Can someone review and commit this patch? :)
>
> 2017. 12. 26. 오후 5:04에 "ASF GitHub Bot (JIRA)" 님이 작성:
>
>>
>> [ https://issues.apache.org/jira/browse/HAMA-1002?page=
>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> tabpanel=16303626#comment-16303626 ]
>>
>> ASF GitHub Bot commented on HAMA-1002:
>> --
>>
>> GitHub user youngwookim opened a pull request:
>>
>> https://github.com/apache/hama/pull/17
>>
>> HAMA-1002: Add junit dependency to commons to compile with Hadoop
2.8+
>>
>>
>>
>> You can merge this pull request into a Git repository by running:
>>
>> $ git pull https://github.com/youngwookim/hama HAMA-1002
>>
>> Alternatively you can review and apply these changes as the patch at:
>>
>> https://github.com/apache/hama/pull/17.patch
>>
>> To close this pull request, make a commit to your master/trunk branch
>> with (at least) the following in the commit message:
>>
>> This closes #17
>>
>> 
>> commit fe57a39cc5a576d11d9137fbae156a79d4d7a8a1
>> Author: Youngwoo Kim 
>> Date:   2017-12-26T06:07:12Z
>>
>> HAMA-1002: Add junit dependency to commons to compile with Hadoop
2.8+
>>
>> 
>>
>>
>> > Add junit dependency to commons to compile with Hadoop 2.8+
>> > ---
>> >
>> > Key: HAMA-1002
>> > URL: https://issues.apache.org/jira/browse/HAMA-1002
>> > Project: Hama
>> >  Issue Type: Bug
>> >  Components: build
>> >Affects Versions: 0.7.1
>> >Reporter: YoungWoo Kim
>> > Fix For: 0.7.2
>> >
>> > Attachments: HAMA-1002.0.patch
>> >
>> >
>> > Compilation with Hadoop 2.8+ does not work because transitive
>> dependencies for Hadoop have been changed:
>> > {noformat}
>> > $ mvn clean package -Phadoop2 -Dhadoop.version=2.8.1 -DskipTests
>> > (snip)
>> > [INFO] Apache Hama parent POM . SUCCESS [
>> 23.797 s]
>> > [INFO] pipes .. SUCCESS [
>> 22.680 s]
>> > [INFO] commons  FAILURE [
>> 6.662 s]
>> > [INFO] core ... SKIPPED
>> > [INFO] graph .. SKIPPED
>> > [INFO] machine learning ... SKIPPED
>> > [INFO] examples ... SKIPPED
>> > [INFO] mesos .. SKIPPED
>> > [INFO] yarn ... SKIPPED
>> > [INFO] hama-dist .. SKIPPED
>> > [INFO] 
>> 
>> > [INFO] BUILD FAILURE
>> > [INFO] 
>> 
>> > [INFO] Total time: 53.544 s
>> > [INFO] Finished at: 2017-12-26T14:55:24+09:00
>> > [INFO] Final Memory: 62M/568M
>> > [INFO] 
>> 
>> > [ERROR] Failed to execute goal org.apache.maven.plugins:
>> maven-compiler-plugin:2.3.2:testCompile (default-testCompile) on project
>> hama-commons: Compilation failure: Compilation failure:
>> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/
>> apache/hama/commons/math/TestDenseDoubleVector.java:[20,23] error:
>> package org.junit does not exist
>> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/
>> apache/hama/commons/math/TestDenseDoubleVector.java:[20,0] error: static
>> import only from classes and interfaces
>> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/
>> apache/hama/commons/math/TestDenseDoubleVector.java:[21,23] error:
>> package org.junit does not exist
>> > (snip)
>> > {noformat}
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.4.14#64029)
>>
>

[DISCUSS] Roadmap

2017-12-27 Thread Chia-Hung Lin

As there are still many areas Hama could be of its use and improved,
it might be a good time to open discuss these issues at this moment.
Some thoughts I have in mind:

- Refactor core package for finer granularity.
  * Separate io into its own package.
  * Decouple BSP interface.

- Monitor subsystem[1]

These could be done incrementally so to reduce the impact of intrusive
blocking. Also some issues might be missed here so please feel free to
comment.

Thanks, and happy new year.

[1]. https://issues.apache.org/jira/browse/HAMA-1001

[ANNOUNCE] Hama New PMC Chair - Edward J. Yoon

2015-11-19 Thread Chia-Hung Lin

On behalf of the Apache Hama PMC, I'm pleased to announce that Apache
Board has approved the nomination of Edward J. Yoon as Hama's new PMC
Chair!

Congratulation!

Re: [DISCUSS] Hama releases for each hadoop version

2015-08-11 Thread Chia-Hung Lin

I am +1 with this if no additional issue.

On 12 August 2015 at 09:23, Edward J. Yoon edwardy...@apache.org wrote:
 Any objections/thoughts?

 On Wed, Jul 22, 2015 at 7:48 PM, Edward J. Yoon edwardy...@apache.org wrote:
 Hi,

 Like http://www.apache.org/dist/spark/spark-1.3.1/, should we create
 release tarball for each hadoop version?

 Otherwise, user always need to replace manually hadoop jar and some
 dependency files in ${HAMA_HOME}/lib folder. Of course, src
 distributions doesn't matter.

 --
 Best Regards, Edward J. Yoon



 --
 Best Regards, Edward J. Yoon

Re: 0.7.1 release plan

2015-08-09 Thread Chia-Hung Lin

+1 Sorry miss this in the mailbox.

On 6 August 2015 at 08:57, Edward J. Yoon edwardy...@apache.org wrote:
 Hey all,

 As you already might know, file IO bugs of YARN module has reported. I
 would like to fix this issue and cut a release 0.7.1 ASAP. Rests looks
 good to me. WDYT?

 --
 Best Regards, Edward J. Yoon

Re: [DISCUSSION] Spinoff ANN package

2015-08-04 Thread Chia-Hung Lin

+1 That looks interesting. I would like to participate in this project.

On 5 August 2015 at 11:52, Edward J. Yoon edwardy...@apache.org wrote:
 Guys,

 I plan to submit a 'DNN platform on top of Apache Hama' proposal as
 below. I know Hama community is somewhat small, but the main reason is
 that this domain-specific project is not fit for Apache Hama
 community. Recruiting volunteers is also hard problem. I expect this
 will become a very nice use-case of Apache Hama.

 If you have any suggestions or other opinions, Please let me know.
 Also, if you want to participate in this project, Pls feel free to add
 your name here.

 Thanks!

 --
 == Abstract ==

 (tentatively named Horn [hɔ:n], korean meaning of Horn is a
 Spirit) is a neuron-centric programming APIs and execution framework
 for large-scale deep learning, built on top of Apache Hama.

 == Proposal ==

 It is a goal of the Horn to provide a neuron-centric programming APIs
 which allows user to easily define the characteristic of artificial
 neural network model and its structure, and its execution framework
 that leverages the heterogeneous resources on Hama and Hadoop YARN
 cluster.

 == Background ==

 The initial ANN code was developed at Apache Hama project by a
 committer, Yexi Jiang (Facebook) in 2013. The motivation behind this
 work is to build a framework that provides more intuitive programming
 APIs like Google's MapReduce or Pregel and supports applications
 needing large model with huge memory consumptions in distributed way.

 == Rationale ==

 While many of deep learning open source softwares are still data or
 model parallel only, we aim to support both data and model parallelism
 and also fault-tolerant system design. The basic idea of data and
 model parallelism is use of the remote parameter server to parallelize
 model creation and distribute training across machines, and the BSP
 framework of Apache Hama for performing asynchronous mini-batches.
 Within single BSP job, each task group works asynchronously using
 region barrier synchronization instead of global barrier
 synchronization, and trains large-scale neural network model using
 assigned data sets in BSP paradigm. This architecture is inspired by
 Google's DistBelief (Jeff Dean et al, 2012).

 == Initial Goals ==

 Some current goals include:

  * builds new community
  * provides more intuitive programming APIs
  * needs both data and model parallelism support
  * must run natively on both Hama and Hadoop2
  * needs also GPUs and InfiniBand support

 == Current Status ==

 === Meritocracy ===

 The core developers understand what it means to have a process based
 on meritocracy. We will provide continuous efforts to build an
 environment that supports this, encouraging community members to
 contribute.

 === Community ===

 A small community has formed within the Apache Hama project and some
 companies such as instant messenger service company and mobile
 manufacturing company. And many people are interested in the
 large-scale deep learning platform itself. By bringing Horn into
 Apache, we believe that the community will grow even bigger.

 === Core Developers ===

 Edward J. Yoon, Thomas Jungblut, and Dongjin Lee

 == Known Risks ==

 === Orphaned Products ===

 Apache Hama is already a core open source component at Samsung
 Electronics, and Horn also will be used by Samsung Electronics, and so
 there is no direct risk for this project to be orphaned.

 === Inexperience with Open Source ===

 Some are very new and the others have experience using and/or working
 on Apache open source projects.

 === Homogeneous Developers ===

 The initial committers are from different organizations such as,
 Microsoft, Samsung Electronics, and Line Plus.

 === Reliance on Salaried Developers ===

 Other developers will also start working on the project in their spare time.

 === Relationships with Other Apache Products ===

  * Horn is based on Apache Hama
  * Apache Zookeeper is used for distributed locking service
  * Natively run on Apache Hadoop and Mesos
  * Horn can be somewhat overlapped with Singa podling.

 === An Excessive Fascination with the Apache Brand ===

 Horn itself will hopefully have benefits from Apache, in terms of
 attracting a community and establishing a solid group of developers,
 but also the relation with Apache Hama, a general-purpose BSP
 computing engine. These are the main reasons for us to send this
 proposal.

 == Documentation ==

 Initial plan about Horn can be found at
 http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html

 == Initial Source ==

 The initial source code has been release as part of Apache Hama
 project developed under Apache Software Foundation. The source code is
 currently hosted at
 https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/

 == Cryptography ==

 Not applicable.

 == Required Resources ==

 Mailing Lists

  * horn-private
  * horn-dev

 Subversion Directory

  *

Re: [VOTE] Apache Hama 0.7 release (RC3)

2015-06-12 Thread Chia-Hung Lin

When compiling with source checked out from
http://svn.apache.org/repos/asf/hama/tags/0.7.0-RC3/, following
warnings are thrown, and the build (mvn clean install) fails. That
looks like the internal dtd problem.

Warning:  org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property
'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is
not recognized.
Compiler warnings:
  WARNING:  'org.apache.xerces.jaxp.SAXParserImpl: Property
'http://javax.xml.XMLConstants/property/accessExternalDTD' is not
recognized.'
Warning:  org.apache.xerces.parsers.SAXParser: Feature
'http://javax.xml.XMLConstants/feature/secure-processing' is not
recognized.
Warning:  org.apache.xerces.parsers.SAXParser: Property
'http://javax.xml.XMLConstants/property/accessExternalDTD' is not
recognized.
Warning:  org.apache.xerces.parsers.SAXParser: Property
'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is
not recognized.
[INFO] Rat check: Summary of files. Unapproved: 4 unknown: 4
generated: 0 approved: 567 licence.



On 12 June 2015 at 18:26, Martin Illecker millec...@apache.org wrote:
 +1

 2015-06-12 10:53 GMT+02:00 Andronidis Anastasios andronat_...@hotmail.com:

 +1

 Kindly,
 Anastasios

  On 12 Jun, 2015, at 09:31, Edward J. Yoon edwardy...@apache.org wrote:
 
  We need more vote. Pls check this RC and vote! thanks.
 
  On Wed, Jun 10, 2015 at 9:55 PM, ByungSeok Min byeongseok@gmail.com
 wrote:
  +1
 
  Best Regards, Byoungseok Min
 
  2015년 6월 10일 수요일, Andronidis Anastasiosandronat_...@hotmail.com님이
 작성한 메시지:
 
  +1
 
  Cheers,
  Anastasios Andronidis
 
  On 10 Jun, 2015, at 10:43, 김민호 minwise@samsung.com
 javascript:;
  wrote:
 
  +1
 
  Best regards,
  Minho Kim
 
  -Original Message-
  From: Edward J. Yoon [mailto:edward.y...@samsung.com javascript:;]
  Sent: Wednesday, June 10, 2015 8:29 AM
  To: dev@hama.apache.org javascript:;
  Subject: RE: [VOTE] Apache Hama 0.7 release (RC3)
 
  +1
 
  Everything looks good.
 
  --
  Best Regards, Edward J. Yoon
 
  -Original Message-
  From: Edward J. Yoon [mailto:edwardy...@apache.org javascript:;]
  Sent: Tuesday, June 09, 2015 7:39 PM
  To: dev@hama.apache.org javascript:;
  Subject: [VOTE] Apache Hama 0.7 release (RC3)
 
  Hey all,
 
  I just created a 3rd release candidate for the Apache Hama 0.7 release
  using Java 7. This RC fixes a bug of yarn module.
 
  The RC3 is available at:
 
  http://people.apache.org/~edwardyoon/dist/0.7.0-RC3/
 
  Tags:
 
  http://svn.apache.org/repos/asf/hama/tags/0.7.0-RC3/
 
  Please try it on your environment, run the tests, verify checksum
 files,
  etc. and vote.
 
  Thanks!
 
  --
  Best Regards, Edward J. Yoon
 
 
 
 
 
 
 
  --
  Best Regards, Edward J. Yoon

Re: [VOTE] Move Hama 0.7 to Java 7+ only

2015-06-04 Thread Chia-Hung Lin

+1

On 4 June 2015 at 21:24, Behroz Sikander behro...@gmail.com wrote:
 +1

 On Thu, Jun 4, 2015 at 3:23 PM, 김민호 eorien...@gmail.com wrote:

 +1

 Best Regards,
 Minho Kim

 2015-06-04 20:26 GMT+09:00 Andronidis Anastasios andronat_...@hotmail.com
 :

  +1
 
   On 4 Jun, 2015, at 12:38, Martin Illecker millec...@apache.org
 wrote:
  
   +1
  
   2015-06-04 12:11 GMT+02:00 Tommaso Teofili tommaso.teof...@gmail.com
 :
  
   +1
  
   Tommaso
  
   2015-06-04 11:55 GMT+02:00 Edward J. Yoon edwardy...@apache.org:
  
   Hello all,
  
   I knew that Java 7 isn't fully backwards compatible with Java 6, but
 I
   haven't experienced any issues at all so far. Because our code
 doesn't
   use any features from Java 7.
  
   However, I just noticed that latest Hadoop supports Java 7+ only and
   we also need to move to Java 7 for supporting YARN. The classic Hama
   cluster mode is outside the area of influence. Should we move to Java
   7?
  
   Thanks!
  
   [ ] +1, We are ending support for Java 6 and move to Java 7.
   [ ] -1, Keep support Java 6, because ...
  
   --
   Best Regards, Edward J. Yoon

Re: Bug in Netty-based RPC

2015-04-27 Thread Chia-Hung Lin

Have you checked limit.conf?

From the message it looks like the files opened at underlying system
exceed its default limit.

On 28 April 2015 at 08:08, Edward J. Yoon edwardy...@apache.org wrote:
 I tried to run BSP job using netty-based RPC instead of message
 bundle, but I received too many open files.

 --

 attempt_201504280858_0001_17_0: 15/04/28 08:28:17 INFO
 ipc.AsyncClient: AsyncClient startup
 attempt_201504280858_0001_17_0: 15/04/28 08:28:21 ERROR
 bsp.BSPTask: Error running bsp setup and bsp function.
 attempt_201504280858_0001_17_0: java.lang.IllegalStateException:
 failed to create a child event loop
 attempt_201504280858_0001_17_0: at
 io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:68)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.MultithreadEventLoopGroup.init(MultithreadEventLoopGroup.java:49)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:61)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:52)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:44)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:36)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncClient$Connection.init(AsyncClient.java:189)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncClient.getConnection(AsyncClient.java:989)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncClient.call(AsyncClient.java:838)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncRPC$Invoker.invoke(AsyncRPC.java:261)
 attempt_201504280858_0001_17_0: at
 com.sun.proxy.$Proxy14.getProtocolVersion(Unknown Source)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncRPC.checkVersion(AsyncRPC.java:524)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:509)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:477)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:435)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:545)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.getBSPPeerConnection(HamaAsyncMessageManagerImpl.java:155)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.transfer(HamaAsyncMessageManagerImpl.java:203)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.bsp.BSPPeerImpl.sendDirectly(BSPPeerImpl.java:382)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.bsp.BSPPeerImpl.send(BSPPeerImpl.java:364)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:467)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.graph.GraphJobRunner.setup(GraphJobRunner.java:128)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
 attempt_201504280858_0001_17_0: at
 org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
 attempt_201504280858_0001_17_0: Caused by:
 io.netty.channel.ChannelException: failed to open a new selector
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:128)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoop.init(NioEventLoop.java:120)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
 attempt_201504280858_0001_17_0: at
 io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:64)
 attempt_201504280858_0001_17_0: ... 24 more
 attempt_201504280858_0001_17_0: Caused by: java.io.IOException:
 Too many open files
 attempt_201504280858_0001_17_0: at sun.nio.ch.IOUtil.makePipe(Native 
 Method)
 attempt_201504280858_0001_17_0: at
 sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:65)
 attempt_201504280858_0001_17_0: at
 sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
 attempt_201504280858_0001_17_0: at
 io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
 attempt_201504280858_0001_17_0: ... 27 more
 attempt_201504280858_0001_17_0: 15/04/28 08:28:21 INFO
 ipc.AsyncServer: AsyncServer gracefully shutdown


 --
 Best Regards, Edward J. Yoon

Re: [DISCUSS] Make Hadoop2 profile default

2015-03-15 Thread Chia-Hung Lin

since HAMA-848 is fixed +1 if no additional issue.

On 16 March 2015 at 09:49, Edward J. Yoon edward.y...@samsung.com wrote:
 Hi all,

 It seems time to switch our project to Hadoop2 base. My suggestion is that
 we keep two hadoop1 and hadoop2 profiles but make hadoop2 profile default.
 WDYT?

 --
 Best Regards, Edward J. Yoon

Offline for holidays

2015-02-13 Thread Chia-Hung Lin

I am going to be offline for 1 ~ 2 weeks for traditional holidays.

Thanks

Re: Legal question to include code like com.sun.tools.javac.Main

2015-01-27 Thread Chia-Hung Lin

Thanks for the information. It's very clear.

The last question. Scala's licensed under [1], so assuming it's legal
to use it's nsc (compiler package) code because its license is not
listed in [2] and does not violate Apache's. Am I correct in that
assumption?

Thanks again for kindly help!

[1]. http://www.scala-lang.org/license.html
[2]. http://www.apache.org/legal/resolved.html#category-x


On 27 January 2015 at 18:43, Mark Thomas ma...@apache.org wrote:
 On 26/01/2015 09:45, Chia-Hung Lin wrote:
 Hi,

 I have a naive question regarding to include the methods like
 Main.compile in com.sun package in the project source code.

 For instance, in our project like hama if there is a source file that
 makes use of com.sun.tools.javac.Main to runtime compile java sources
 into classes. Is it legal to release or include that source with
 project?

 Legally, yes that is fine. You can reference internal JVM vendor classes
 if you wish.

 What you may not do is include tools.jar in your distribution since the
 license for that JAR is not compatible with distribution under the ALv2.

 Mark

 -
 To unsubscribe, e-mail: legal-discuss-unsubscr...@apache.org
 For additional commands, e-mail: legal-discuss-h...@apache.org

Legal question to include code like com.sun.tools.javac.Main

2015-01-26 Thread Chia-Hung Lin

Hi,

I have a naive question regarding to include the methods like
Main.compile in com.sun package in the project source code.

For instance, in our project like hama if there is a source file that
makes use of com.sun.tools.javac.Main to runtime compile java sources
into classes. Is it legal to release or include that source with
project?

Thanks

Git security issue

2014-12-20 Thread Chia-Hung Lin

If you are using git repository, you may need to update git client for
security concern.

https://github.com/blog/1938-vulnerability-announced-update-your-git-clients

http://article.gmane.org/gmane.linux.kernel/1853266

Re: MapWritable is really bad

2014-09-25 Thread Chia-Hung Lin

Should we roll out with our own implementation? Switching to kyro look
like will get of this issue, but if we are going to have plugable
serialization framework (are we?) that might help those who need it.

On 24 September 2014 17:19, Edward J. Yoon edwardy...@apache.org wrote:
 Interesting ..

 On Fri, Sep 19, 2014 at 11:52 PM, Andronidis Anastasios
 andronat_...@hotmail.com wrote:
 Hello,

 I remember a discussion rose upon performance issues on messages and that 
 kryo serializer helped a lot.

 Please read this: 
 http://www.chrisstucchio.com/blog/2011/mapwritable_sometimes_a_performance_hog.html

 From a custom test I did, I was sending some messages with MapWritable as a 
 container, Text as key and ArrayWritable (with integers inside) as a value. 
 Hama was reporting 20MB of traffic. When I wrote my own Map (that implements 
 Writable interface) I reduced the amount from 20MB to 1.5MB..

 Cheers,
 Anastasios



 --
 Best Regards, Edward J. Yoon
 CEO at DataSayer Co., Ltd.

Re: [DISCUSS/VOTE] Refactor of message queue .

2014-08-29 Thread Chia-Hung Lin

Is incoming bundles manager equals to localQueueForNextIteration/
localQueue in 0.6.4 version?

My understanding for localQueueForNextIteration/ localQueue is that
their role serves as input for the coming superstep; for instance, in
the N-th superstep its incoming messages are obtained from localQueue
(AbstractMessageManager.getCurrentMessage). In this case it look like
safe to save messages to local disk because once a node fails, steps
to recover from the previous superstep should be the same without
change by placing the messages saved previous to the localQueue again.
If not, then probably we need to consider about this issue.







On 29 August 2014 08:09, Edward J. Yoon edwardy...@apache.org wrote:
 First of all, Our main problem is that current system requires a lot
 of memory space, especially graph module. As you already might know,
 the main memory consumer is the message queue.

 To solve this problem, we considered the use of local disk space e.g.,
 DiskQueue and SpillingQueue. However, those queues are basically not
 able to bundle and group the messages by destination server, in
 memory-efficient way. So, I don't think this approach is right way.

 My solution for saving the memory usage and the performance
 degradation, is storing serializable message objects as a byte array
 in queue. In graph case, 3X ~ 6X memory efficiency is expected than
 before (GraphJobMessage consists of destination vertex ID and message
 value multi-objects).

 In 0.6.4, Outgoing queue is replaced with outgoing bundles manager,
 and it showed nice memory improvement. Now I wanna start refactoring
 of incoming queue.

 My plan is that adding incoming bundles manager. Bundles can also
 simply be written to local disk if when memory space is not enough.
 So, incoming bundles manager can be performed a similar role of
 DiskQueue and SpillingQueue in the future.

 If you have any other opinion, Please let me know. If there are no
 objections, I'll do it.

 --
 Best Regards, Edward J. Yoon
 CEO at DataSayer Co., Ltd.

Re: Remove Spilling Queue and rewrite checkpoint/recovery

2014-08-18 Thread Chia-Hung Lin

I will make the code stable first before merging it back.

On 18 August 2014 17:40, Edward J. Yoon edwardy...@apache.org wrote:
Do you have any plan for merging them?

This is side opinion. If we want to use Git, now I'm +1.

On Sat, Aug 16, 2014 at 12:00 AM, Chia-Hung Lin cli...@googlemail.com wrote:
Code right now is at https://github.com/chlin501/hama.git

Maven and jdk are required to build the project

Command to have a clean build:
mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true

To test a specific test case:
mvn -DskipTests=false -Dtest=TestCaseName test

On 15 August 2014 18:21, Suraj Menon menonsur...@gmail.com wrote:
Hi Edward, sorry to enter the discussion so late.

Bundling and Unbundling of message queue is not Spilling queue's
responsibility, it was ended up there to be compatible with the existent
implementation of BSP Peer communication. Remember Spilling Queue
implementation was done to immediately remove some OutOfMemory issues on
sender side first. Spilling Queue gives you a byte array (ByteBuffer) with
a batch of serialized messages. This is effectively bundling the messages
in byte array (hence the ByteArrayMessageBundle) and sending them for
processing. The SpilledDataProcessor's are implemented as a pipeline of
processing done using inheritance, something like what we may use trait for
in Scala. So if we have a SpilledDataProcessor that sends this bundled
message via RPC to the peer, there is no need to write them to file and
read them back. As I previously mentioned this was done to be compatible
with the existent implementation of peer.send.

Also, the async checkpoint recovery code was written before spilling queue.
Today we can remove the single message write and do this in before peer
sync phase to just write the whole file to HDFS.

I would say performance numbers and maintainability comes first and if you
think removing spilling queue is a solution go for it. As far as async
checkpointing is to be considered, that was a first proof of concept we did
and it is high time we move forward from there.

Chiahung, do you have some instruction on where and how I can build the
scala version of your code?

I am really finding it hard to dedicate time for Hama these days.

- Suraj

On Tue, Aug 12, 2014 at 7:15 AM, Edward J. Yoon edwardy...@apache.org
wrote:

ChiaHung,

Yes, I'm thinking similar things.

On Tue, Aug 12, 2014 at 4:11 PM, Chia-Hung Lin cli...@googlemail.com
wrote:
I am currently working on this part based on the superstep api,
similar to the Superstep.java in the trunk.

The checkpointer[1] saves bundle message instead of single message.
Not very sure if this is what you are looking for?

[1].
https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala

On 12 August 2014 15:04, Edward J. Yoon edwardy...@apache.org wrote:
I think that transferring single messages at a time is not a wise way.
Bundle is used to avoid network overheads and contentions. So, if we
use Bundle, each processor always sends/receives an bundles.

BSPMessageBundle is Writable (and Iterable). And it manages the
serialized message as a byte array. If we write an bundles when
checkpointing or using Disk-queue, it'll be more simple and faster.

In Spilling Queue case, it always requires the process of unbundling
and putting messages into queue.

On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
-1, can't we first discuss? Also it'd be helpful to be more specific
on the
problems.
Tommaso

2014-08-12 4:25 GMT+02:00 Edward J. Yoon edwardy...@apache.org:

All,

I'll delete Spilling queue, and rewrite checkpoint/recovery
implementation (checkpointing bundles is better than checkpointing all
messages). Current implementation is quite mess :/ there are huge
deserialization/serialization overheads..

--
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

Re: Remove Spilling Queue and rewrite checkpoint/recovery

2014-08-15 Thread Chia-Hung Lin

Code right now is at https://github.com/chlin501/hama.git

Maven and jdk are required to build the project

Command to have a clean build:
mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true

To test a specific test case:
mvn -DskipTests=false -Dtest=TestCaseName test

On 15 August 2014 18:21, Suraj Menon menonsur...@gmail.com wrote:
Hi Edward, sorry to enter the discussion so late.

Also, the async checkpoint recovery code was written before spilling queue.
Today we can remove the single message write and do this in before peer
sync phase to just write the whole file to HDFS.

Chiahung, do you have some instruction on where and how I can build the
scala version of your code?

I am really finding it hard to dedicate time for Hama these days.

- Suraj

On Tue, Aug 12, 2014 at 7:15 AM, Edward J. Yoon edwardy...@apache.org
wrote:

ChiaHung,

Yes, I'm thinking similar things.

On Tue, Aug 12, 2014 at 4:11 PM, Chia-Hung Lin cli...@googlemail.com
wrote:
I am currently working on this part based on the superstep api,
similar to the Superstep.java in the trunk.

The checkpointer[1] saves bundle message instead of single message.
Not very sure if this is what you are looking for?

[1].
https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala

BSPMessageBundle is Writable (and Iterable). And it manages the
serialized message as a byte array. If we write an bundles when
checkpointing or using Disk-queue, it'll be more simple and faster.

In Spilling Queue case, it always requires the process of unbundling
and putting messages into queue.

On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
-1, can't we first discuss? Also it'd be helpful to be more specific
on the
problems.
Tommaso

2014-08-12 4:25 GMT+02:00 Edward J. Yoon edwardy...@apache.org:

All,

--
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

Re: Remove Spilling Queue and rewrite checkpoint/recovery

2014-08-12 Thread Chia-Hung Lin

I am currently working on this part based on the superstep api,
similar to the Superstep.java in the trunk.

The checkpointer[1] saves bundle message instead of single message.
Not very sure if this is what you are looking for?

[1].
https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala

BSPMessageBundle is Writable (and Iterable). And it manages the
serialized message as a byte array. If we write an bundles when
checkpointing or using Disk-queue, it'll be more simple and faster.

In Spilling Queue case, it always requires the process of unbundling
and putting messages into queue.

On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
-1, can't we first discuss? Also it'd be helpful to be more specific on the
problems.
Tommaso

2014-08-12 4:25 GMT+02:00 Edward J. Yoon edwardy...@apache.org:

All,

--
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

Re: Questions on Hama

2014-08-10 Thread Chia-Hung Lin

Perhaps you can check ${module-name}/target/surefire-report/ for more
detail about which test cases fail.

Apache Hama is a BSP engine, meaning it's not only capable of
performing graph computation. Instead, it's suitable for general
purpose parallel computing as long as the algorithm can be expressed
as an iterative application. The benefit of separating Graph Job from
BSP Job is that users can perform their tasks without too much
restriction. For example, a user not merely can write a program to
perform graph computation, but also can write general BSP jobs when
it's required.





On 10 August 2014 22:15, Dongjin Lee dongjin.lee...@gmail.com wrote:
 Hi. I am a Hama user, who is analyzing source code now. I have some
 questions.

 *1. Build Fail*

 When I tried build with mvn clean install with version 0.6.4, it succeeded
 clearly. However, when I tried build with mvn --projects core,examples
 install, it failed on test task.

 I think there is something I don't understand yet, but It would be better
 to update http://wiki.apache.org/hama/HowToContribute, to prevent
 confusion. I got above command from that wiki page.

 *2. Why BSPJob and GraphJob is separated?*

 I read BSPJob  GraphJob class, and feel its design is a little bit weird.
 Why they are separated? Is there any design decision I don't know?

 Thanks in Advance.
 - Dongjin

 --
 *Dongjin Lee*


 *So interested in massive-scale machine learning.facebook:
 www.facebook.com/dongjin.lee.kr
 http://www.facebook.com/dongjin.lee.krlinkedin:
 kr.linkedin.com/in/dongjinleekr
 http://kr.linkedin.com/in/dongjinleekrgithub:
 http://goog_969573159github.com/dongjinleekr
 http://github.com/dongjinleekrtwitter: www.twitter.com/dongjinleekr
 http://www.twitter.com/dongjinleekr*

Re: [ANN] Welcome to new Hama committers

2014-08-10 Thread Chia-Hung Lin

Congrats to join the community!

On 11 August 2014 06:46, Edward J. Yoon edwardy...@apache.org wrote:
 The Apache Hama PMC is pleased to announce the following additions:

 * Victor Lee is now a Hama Committer
 * ByungSeok Min is now a Hama committer.

 Thanks again to both for their efforts, we hope to see them continue
 to move Apache Hama forward.

 Thanks!

 --
 Best Regards, Edward J. Yoon
 CEO at DataSayer Co., Ltd.

Re: [ANN] Jeff Fenchel as a new Hama committer

2014-06-09 Thread Chia-Hung Lin

Congratulation, and thanks for the contribution!

On 9 June 2014 18:41, Andronidis Anastasios andronat_...@hotmail.com wrote:
 congrats Jeff!

 Anastasis

 On 9 Ιουν 2014, at 11:59 π.μ., Edward J. Yoon edwardy...@apache.org wrote:

 The Hama PMC is pleased to announce Jeff Fenchel (Mesos module
 contributor) as a new committer of Apache Hama.

 Congrats, Jeff Fenchel!

 --
 Best Regards, Edward J. Yoon
 CEO at DataSayer Co., Ltd.

Re: Renil Joseph as a new Hama committer

2014-05-23 Thread Chia-Hung Lin

Congratulation! Welcome joining Hama community.

On 23 May 2014 14:59, Edward J. Yoon edwardy...@apache.org wrote:
 The Hama PMC is pleased to announce Renil Joseph as a new committer of
 Hama. We look forward to his continuing involvement with Hama.

 Congrats, Renil Joseph!

 --
 Best Regards, Edward J. Yoon
 CEO at DataSayer Co., Ltd.

Re: [DISCUSS] Disk Queue and Spilling Queue

2014-05-12 Thread Chia-Hung Lin

Is it going to use rpc?

Will it still use the interface, for instance, MessageManager.java?

Just to check if any point for integrating with the current ongoing
refactoring process.

If possible, perhaps decoupling io part and rpc from interface would
somehow simplify the integration progress.


On 12 May 2014 09:01, Edward J. Yoon edwardy...@apache.org wrote:
 The old design of outgoing/incoming message queues is readable but it
 has some problems, and the most performance and memory issues are
 dependent upon this part.

 1) To send a messages to destination Peer, we serialize, compress, and
 bundle the messages. So, using disk or spilling queue for the outgoing
 messages is pointless and cause of degradation. This issue SOLVED by
 HAMA-853. We'll need to add disk-based bundle in the future.

 2) Receive-side queue is also the same. Instead of unbundling (and
 deserializing, decompressing) bundles into {memory, disk, or spilling}
 queue, we should use bundles in efficient and asynchronous way.

 If you agree with this, I'll start to refactor the whole queue system.

 If you have any other ideas e.g., asynchronous message
 synchronization, Pls let me know.

 Thanks.

 --
 Best Regards, Edward J. Yoon
 CEO at DataSayer Co., Ltd.

Re: Refactoring code

2014-04-30 Thread Chia-Hung Lin

The refactoring process is still underway. Is it appropriate to submit
something that is wip or may be changed drastically later on?

On 30 April 2014 03:12, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 I do like this approach as well, I'd be curious to try it out myself, maybe
 in the next weeks.
 Maybe a chance for a submission to ApacheCon EU?

 Regards,
 Tommaso


 2014-04-29 17:01 GMT+02:00 Chia-Hung Lin cli...@googlemail.com:

 The core module are heavily refactored. And will try our best to
 retain the original interface so that users and other modules won't be
 impacted too much. Basically the users side will still use Java code
 because for the core module it's just interfaces/classes definition.

 Please let me know if anything in terms of this.


 On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote:
  Looks like a complete refactor and with a different language ;). I am +1
 on
  moving to Scala.
 
  -Suraj
 
 
  On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin cli...@googlemail.com
 wrote:
 
  Hi
 
  Recently I am working on the refactoring bsp core module. It's
  basically related to HAMA-881.
 
  There's already some changes. If you have interested, the code is at
  https://github.com/chlin501

Re: Refactoring code

2014-04-30 Thread Chia-Hung Lin

Oop sorry. The submission mentioned is related to

Maybe a chance for a submission to ApacheCon EU?


On 30 April 2014 15:10, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 sure, maybe in a branch, or under a sandbox directory e.g.
 svn.apache.org/repos/asf/hama/sandbox/core-akka

 Tommaso


 2014-04-30 9:02 GMT+02:00 Chia-Hung Lin cli...@googlemail.com:

 The refactoring process is still underway. Is it appropriate to submit
 something that is wip or may be changed drastically later on?

 On 30 April 2014 03:12, Tommaso Teofili tommaso.teof...@gmail.com wrote:
  I do like this approach as well, I'd be curious to try it out myself,
 maybe
  in the next weeks.
  Maybe a chance for a submission to ApacheCon EU?
 
  Regards,
  Tommaso
 
 
  2014-04-29 17:01 GMT+02:00 Chia-Hung Lin cli...@googlemail.com:
 
  The core module are heavily refactored. And will try our best to
  retain the original interface so that users and other modules won't be
  impacted too much. Basically the users side will still use Java code
  because for the core module it's just interfaces/classes definition.
 
  Please let me know if anything in terms of this.
 
 
  On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote:
   Looks like a complete refactor and with a different language ;). I am
 +1
  on
   moving to Scala.
  
   -Suraj
  
  
   On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin cli...@googlemail.com
  wrote:
  
   Hi
  
   Recently I am working on the refactoring bsp core module. It's
   basically related to HAMA-881.
  
   There's already some changes. If you have interested, the code is at
   https://github.com/chlin501

Re: Refactoring code

2014-04-30 Thread Chia-Hung Lin

Get it. Thanks!

On 30 April 2014 15:23, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 sure, I think work in progress as an effort to innovate and improve our
 current architecture and performance would be a great one, and I've seen
 wip talks in the past.

 Tommaso


 2014-04-30 9:19 GMT+02:00 Chia-Hung Lin cli...@googlemail.com:

 Oop sorry. The submission mentioned is related to

 Maybe a chance for a submission to ApacheCon EU?


 On 30 April 2014 15:10, Tommaso Teofili tommaso.teof...@gmail.com wrote:
  sure, maybe in a branch, or under a sandbox directory e.g.
  svn.apache.org/repos/asf/hama/sandbox/core-akka
 
  Tommaso
 
 
  2014-04-30 9:02 GMT+02:00 Chia-Hung Lin cli...@googlemail.com:
 
  The refactoring process is still underway. Is it appropriate to submit
  something that is wip or may be changed drastically later on?
 
  On 30 April 2014 03:12, Tommaso Teofili tommaso.teof...@gmail.com
 wrote:
   I do like this approach as well, I'd be curious to try it out myself,
  maybe
   in the next weeks.
   Maybe a chance for a submission to ApacheCon EU?
  
   Regards,
   Tommaso
  
  
   2014-04-29 17:01 GMT+02:00 Chia-Hung Lin cli...@googlemail.com:
  
   The core module are heavily refactored. And will try our best to
   retain the original interface so that users and other modules won't
 be
   impacted too much. Basically the users side will still use Java code
   because for the core module it's just interfaces/classes definition.
  
   Please let me know if anything in terms of this.
  
  
   On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote:
Looks like a complete refactor and with a different language ;). I
 am
  +1
   on
moving to Scala.
   
-Suraj
   
   
On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin 
 cli...@googlemail.com
   wrote:
   
Hi
   
Recently I am working on the refactoring bsp core module. It's
basically related to HAMA-881.
   
There's already some changes. If you have interested, the code is
 at
https://github.com/chlin501

Re: Refactoring code

2014-04-29 Thread Chia-Hung Lin

The core module are heavily refactored. And will try our best to
retain the original interface so that users and other modules won't be
impacted too much. Basically the users side will still use Java code
because for the core module it's just interfaces/classes definition.

Please let me know if anything in terms of this.


On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote:
 Looks like a complete refactor and with a different language ;). I am +1 on
 moving to Scala.

 -Suraj


 On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin cli...@googlemail.comwrote:

 Hi

 Recently I am working on the refactoring bsp core module. It's
 basically related to HAMA-881.

 There's already some changes. If you have interested, the code is at
 https://github.com/chlin501

Open ssl security issue

2014-04-15 Thread Chia-Hung Lin

Due to open ssl vulnerabilities[1], it might be good for committer,
etc. to reset their Apache password as mentioned in [2].

[1]. http://www.openssl.org/news/vulnerabilities.html
[2]. https://blogs.apache.org/infra/entry/heartbleed_fallout_for_apache

Re: [DISCUSS] Rename of ML and Graph modules

2014-04-14 Thread Chia-Hung Lin

Graph package is clearer for me.

Mach may be confusing with CMU's os microkernel.

Or we want a code name for each release like some GNU/ Linux dist release?




On 14 April 2014 14:57, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 I think graph is a pretty fine name, it'easy to understand it's Hama
 applied to graphs, for 'ml' maybe it's a bit too short so ml may mean
 anything even if I don't think 'mach' improves that.

 Regards,
 Tommaso


 2014-04-13 17:01 GMT+02:00 Yexi Jiang yexiji...@gmail.com:

 It seems that the old names are better than the new names.


 2014-04-13 8:10 GMT-04:00 Andronidis Anastasios andronat_...@hotmail.com
 :

  hi,
 
  sorry but i don't understand what the new names mean.
 
  what is a b-graph? mach?
 
  kindly,
  anastasis
 
  On 13 Απρ 2014, at 1:54 μ.μ., Edward J. Yoon edwardy...@apache.org
  wrote:
 
   Because they are too ambiguous and unmemorable.
  
   On Sun, Apr 13, 2014 at 8:25 PM, Tommaso Teofili
   tommaso.teof...@gmail.com wrote:
   why?
  
   Tommaso
  
  
   2014-04-13 12:57 GMT+02:00 Edward J. Yoon edwardy...@apache.org:
  
   I propose that we rename the graph and ml modules:
  
   1. Hama Graph - B-Graph
   2. Hama ML - Mach
  
   WDYT?
  
   --
   Edward J. Yoon (@eddieyoon)
   Chief Executive Officer
   DataSayer Co., Ltd.
  
  
  
  
   --
   Edward J. Yoon (@eddieyoon)
   Chief Executive Officer
   DataSayer Co., Ltd.
 
 


 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-04-11 Thread Chia-Hung Lin

In that case are we going to organize multiple tasks into a group? A
job has N bsp groups (bsp task in current code), in turn each group
contain multiple tasks (and all tasks are on the same server)?

If this is the case, how do they send messages or communicate between
groups? group to group? A task (within a group) can arbitrary send the
messages?

I have this question because this would have implication on FT. IIRC
Storm is a CEP framework, and messages can be sent arbitrary to every
bolt. The issue with such computation is that it's not a simple task
when performing checkpoint. Generally it's done through communication
induced checkpointing. Otherwise like storm they ack and redo each
message when necessary; an option is something like batch (in storm
like trident batch processing if I am correct) transactional
processing.

What I can think of right now is, with current structure, grouping
every N messages a superstep, and then asynchronously checkpointing,
which may be similar to trident batch processing.

I understand it's still far away based on the current status. I
suppose it's good if we can take that into consideration beforehand as
well.

On 11 April 2014 13:40, Edward J. Yoon edwardy...@apache.org wrote:
Yesterday, I had survey the Storm. Storm's task grouping and chainable
bolts seems pretty nice (especially, chainable bolts can be really
useful in case of real-time join operation).

I think, we can also implement similar functions of Storm's task
grouping and chainable bolts on BSP. My rough idea is:

1. Launches multi-tasks per node (as number of group of Bolts). For example:

+---+
|Server1|
+---+
Task-1. tailing bolt
Task-2. split sentence bolt
Task-3. wordcount bolt

2. Assign the tasks to proper group.
--
3. Each task executes their user-defined function and sends messages
to task of next group.
4. Synchronizes all.
--
5. Finally, repeat the above 3 ~ 4 process.

In here, only the difficult one is how to determine the task group at
initial superstep. So, I'd like to add below one to BSPPeer interface.

/**
* @return the names of locally adjacent peers (including this peer).
*/
public String[] getAdjacentPeerNames();

On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang yexiji...@gmail.com wrote:
great~

2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) j...@apache.org:

[
https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958430#comment-13958430]

Edward J. Yoon commented on HAMA-883:
-

NOTE: my fellow worker is currently working on this issue -
https://github.com/garudakang/meerkat

[Research Task] Massive log event aggregation in real time using Apache
Hama

Key: HAMA-883
URL: https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
Issue Type: Task
Reporter: Edward J. Yoon

BSP tasks can be used for aggregating log data streamed in real time.
With this research task, we might able to platformization these kind of
processing.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

--
--
Yexi Jiang,
ECS 251, yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

--
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer Co., Ltd.

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-04-11 Thread Chia-Hung Lin

No problem. It's a good discussion so we can examine and improve accordingly.

I am still not very sure about the topology, or how tasks are grouped.
From description, it seems looks as the link below:

http://i.imgur.com/92L2XY1.png

Each GroomServer is viewed as a group, and each group will launch 3
tasks by default (as default xml defined). So the corresponded
messages, emitted from source like queue, is sent to each group for
consumption? And how do task communicate between groups/ tasks?

On 11 April 2014 16:43, Edward J. Yoon edw...@datasayer.com wrote:
My rough idea assumes that dedicated Hama is installed on machines that
generates logs, and the number of child tasks will be launched equally per
GroomServer. So, if the groups == 3, framework launches 3 tasks per node.
At first superstep, one task broadcasts the Topology after grouping the
Tasks into 3 groups.

== Group1 ==
server1:60001
server2:60001
server3:60001

== Group2 ==
server1:60002
server2:60002
server3:60002

== Group3 ==
server1:60003
server2:60003
server3:60003

Based on this Topolgy, tasks reflects proper class and executes it. Then,
it'll work like Storm flow. I didn't think about FT issue yet. :-)

On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin cli...@googlemail.comwrote:

Or we can have POC first and then see how it relates to the issue we
might need to fix.

On 11 April 2014 16:10, Chia-Hung Lin cli...@googlemail.com wrote:
In that case are we going to organize multiple tasks into a group? A
job has N bsp groups (bsp task in current code), in turn each group
contain multiple tasks (and all tasks are on the same server)?

If this is the case, how do they send messages or communicate between
groups? group to group? A task (within a group) can arbitrary send the
messages?

What I can think of right now is, with current structure, grouping
every N messages a superstep, and then asynchronously checkpointing,
which may be similar to trident batch processing.

I understand it's still far away based on the current status. I
suppose it's good if we can take that into consideration beforehand as
well.

I think, we can also implement similar functions of Storm's task
grouping and chainable bolts on BSP. My rough idea is:

1. Launches multi-tasks per node (as number of group of Bolts). For
example:

+---+
|Server1|
+---+
Task-1. tailing bolt
Task-2. split sentence bolt
Task-3. wordcount bolt

2. Assign the tasks to proper group.
--
3. Each task executes their user-defined function and sends messages
to task of next group.
4. Synchronizes all.
--
5. Finally, repeat the above 3 ~ 4 process.

In here, only the difficult one is how to determine the task group at
initial superstep. So, I'd like to add below one to BSPPeer interface.

/**
* @return the names of locally adjacent peers (including this peer).
*/
public String[] getAdjacentPeerNames();

On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang yexiji...@gmail.com
wrote:
great~

2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) j...@apache.org:

[

https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958430#comment-13958430
]

Edward J. Yoon commented on HAMA-883:
-

NOTE: my fellow worker is currently working on this issue -
https://github.com/garudakang/meerkat

[Research Task] Massive log event aggregation in real time using
Apache
Hama

Key: HAMA-883
URL: https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
Issue Type: Task
Reporter: Edward J. Yoon

BSP tasks can be used for aggregating log data streamed in real
time.
With this research task, we might able to platformization these kind
of
processing.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

--
--
Yexi Jiang,
ECS 251, yjian...@cs.fiu.edu
School of Computer

Re: I'd like to contribute to Hama

2014-04-09 Thread Chia-Hung Lin

Hi

You can have a look at jira [1]. Checking out the source [2]. And
submit patch as attachment to a jira ticket. Welcome for contribution.
: )

[1]. 
https://issues.apache.org/jira/browse/HAMA/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

[2]. http://wiki.apache.org/hama/HowToContribute


On 9 April 2014 13:23, Byeong Seok Min byeongseok@gmail.com wrote:
 Hello Hama

Re: [DISCUSS] Fault tolerant BSP job

2014-04-09 Thread Chia-Hung Lin

Sorry don't catch the point.

What's difference between pure BSP and FT BSP? Any concrete example?


On 9 April 2014 08:29, Edward J. Yoon edwardy...@apache.org wrote:
 In my eyes, SuperstepPiEstimator[1] look like totally new programming
 model, very similar with Pregel.

 I personally would like to suggest that we provide both pure BSP and
 fault tolerant BSP model, instead of replace.

 1. 
 http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java

 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.

Re: [DISCUSS] Fault tolerant BSP job

2014-04-09 Thread Chia-Hung Lin

Not very sure if we sync at the same page. And sorry I am not very
familiar with Superstep implementation.

I assume that traditional bsp model means the original bsp interface
where there is a bsp function and user can freely call peer.sync(),
etc. methods

 bsp(BSPPeer ... peer) {
// whatever computation
peer.sync();
}

And the superstep style is with Superstep abstract class.

If this is the case, SuperstepBSP.java has already call sync, as
below, outside each Superstep.compute(). So it looks like even
SuperstepPiEstimator doesn't call sync() method, barrier sync will be
executed because each Superstep is viewed as a superstep in original
BSP definition.

  @Override
  public void bsp(BSPPeerK1, V1, K2, V2, M peer) throws IOException,
  SyncException, InterruptedException {
for (int index = startSuperstep; index  supersteps.length; index++) {
  SuperstepK1, V1, K2, V2, M superstep = supersteps[index];
  superstep.compute(peer);
  if (superstep.haltComputation(peer)) {
break;
  }
  peer.sync();
  startSuperstep = 0;
}
  }

Within the Superstep.compute(), if sync is called again, I would think
that another barrier sync will be executed.

SuperstepBSP.java

for(...) {
  superstep .compute() - { // in compute method
...
peer.sync()
  }
  ...
  peer.sync()
}

IIRC each call to sync may raise the checkpoint (no recovery) method
serialize message to hdfs.

For SerializePrinting, following code snippet  may move

for (String otherPeer : bspPeer.getAllPeerNames()) {
bspPeer.send(otherPeer, new IntegerMessage(bspPeer.getPeerName(), i));
}

to Superstep.compute()

And the outer for loop is what is programmed in SuperstepBSP.java

for (int i = 0; i  NUM_SUPERSTEPS; i++) {
// code that should be moved to Superstep.compute()
}
bspPeer.sync();



On 9 April 2014 16:17, Edward J. Yoon edwardy...@apache.org wrote:
 As you can see here[1], the sync() method never called, and an classes
 of all superstars were needed to be declared within Job configuration.
 Therefore, I thought it's similar with Pregel style on BSP model. It's
 quite different from legacy model in my eyes.

 According to HAMA-505, superstep API seems used for FT job processing
 (I didn't read closely yet). Right? In here, I have an questions. What
 happens if I call the sync() method within compute() method? In this
 case, framework guarantees the checkpoint/recovery? And how can I
 implement the http://wiki.apache.org/hama/SerializePrinting using
 superstep API?

 What's difference between pure BSP and FT BSP? Any concrete example?

 I was mean the traditional BSP programming model.

 1. 
 http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java

 On Wed, Apr 9, 2014 at 4:25 PM, Chia-Hung Lin cli...@googlemail.com wrote:
 Sorry don't catch the point.

 What's difference between pure BSP and FT BSP? Any concrete example?


 On 9 April 2014 08:29, Edward J. Yoon edwardy...@apache.org wrote:
 In my eyes, SuperstepPiEstimator[1] look like totally new programming
 model, very similar with Pregel.

 I personally would like to suggest that we provide both pure BSP and
 fault tolerant BSP model, instead of replace.

 1. 
 http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java

 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.



 --
 Edward J. Yoon (@eddieyoon)
 CEO at DataSayer Co., Ltd.

Re: [DISCUSS] Fault tolerant BSP job

2014-04-09 Thread Chia-Hung Lin

That's why I proposed to use Superstep api instead, though I prefer
plain bsp function. Unless we want to instrument the source code,
which I believe is not what we, including users, want.

With Superstep api we can resume the message from the latest (the new
refactored code should base on this as well) checkpointed message,
under some precondition.

Alternative we can implement our own code (not Java or probably in
Java 8) to perform checkpoint, but that would take very long time in
accomplishing those tasks. I would put that issue in the future
roadmap because personally I perform plain bsp  function instead of
Superstep.


On 9 April 2014 23:56, Suraj Menon surajsme...@apache.org wrote:
 I don't like my patch in HAMA-639 myself, eventhough I believe it satisfies
 all the mentioned requirements. The usage of superstep chaining API
 implementation in the patch is too complicated. A superstep here is like a
 transformation function you define on an RDD in Spark. So if you look into
 FT design of Spark, on failure, they rerun the operations on the RDD to get
 to the current state. This is similar to what we have in mind using
 checkpointing. The challenge is in getting the same messages replayed to
 newly spawned task on checkpointed data. If you don't use the Superstep(or
 any other abstraction representing a function) you cannot start processing
 from a line of code where the failure occurred. (Java does not support goto
 line number.)

 -Suraj


 On Wed, Apr 9, 2014 at 7:29 AM, Edward J. Yoon edwardy...@apache.orgwrote:

 I just found this: https://issues.apache.org/jira/browse/HAMA-503 and
 HAMA-639.

 Do you still think superstep API is essential for checkpoint/recovery?
 If not, we can drop it. I don't think it's good idea.

 On Wed, Apr 9, 2014 at 7:43 PM, Chia-Hung Lin cli...@googlemail.com
 wrote:
  Not very sure if we sync at the same page. And sorry I am not very
  familiar with Superstep implementation.
 
  I assume that traditional bsp model means the original bsp interface
  where there is a bsp function and user can freely call peer.sync(),
  etc. methods
 
   bsp(BSPPeer ... peer) {
  // whatever computation
  peer.sync();
  }
 
  And the superstep style is with Superstep abstract class.
 
  If this is the case, SuperstepBSP.java has already call sync, as
  below, outside each Superstep.compute(). So it looks like even
  SuperstepPiEstimator doesn't call sync() method, barrier sync will be
  executed because each Superstep is viewed as a superstep in original
  BSP definition.
 
@Override
public void bsp(BSPPeerK1, V1, K2, V2, M peer) throws IOException,
SyncException, InterruptedException {
  for (int index = startSuperstep; index  supersteps.length; index++)
 {
SuperstepK1, V1, K2, V2, M superstep = supersteps[index];
superstep.compute(peer);
if (superstep.haltComputation(peer)) {
  break;
}
peer.sync();
startSuperstep = 0;
  }
}
 
  Within the Superstep.compute(), if sync is called again, I would think
  that another barrier sync will be executed.
 
  SuperstepBSP.java
 
  for(...) {
superstep .compute() - { // in compute method
  ...
  peer.sync()
}
...
peer.sync()
  }
 
  IIRC each call to sync may raise the checkpoint (no recovery) method
  serialize message to hdfs.
 
  For SerializePrinting, following code snippet  may move
 
  for (String otherPeer : bspPeer.getAllPeerNames()) {
  bspPeer.send(otherPeer, new
 IntegerMessage(bspPeer.getPeerName(), i));
  }
 
  to Superstep.compute()
 
  And the outer for loop is what is programmed in SuperstepBSP.java
 
  for (int i = 0; i  NUM_SUPERSTEPS; i++) {
  // code that should be moved to Superstep.compute()
  }
  bspPeer.sync();
 
 
 
  On 9 April 2014 16:17, Edward J. Yoon edwardy...@apache.org wrote:
  As you can see here[1], the sync() method never called, and an classes
  of all superstars were needed to be declared within Job configuration.
  Therefore, I thought it's similar with Pregel style on BSP model. It's
  quite different from legacy model in my eyes.
 
  According to HAMA-505, superstep API seems used for FT job processing
  (I didn't read closely yet). Right? In here, I have an questions. What
  happens if I call the sync() method within compute() method? In this
  case, framework guarantees the checkpoint/recovery? And how can I
  implement the http://wiki.apache.org/hama/SerializePrinting using
  superstep API?
 
  What's difference between pure BSP and FT BSP? Any concrete example?
 
  I was mean the traditional BSP programming model.
 
  1.
 http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java
 
  On Wed, Apr 9, 2014 at 4:25 PM, Chia-Hung Lin cli...@googlemail.com
 wrote:
  Sorry don't catch the point.
 
  What's difference between pure BSP and FT BSP? Any concrete example?
 
 
  On 9 April 2014 08:29, Edward J. Yoon edwardy

Re: Project and Website Layout Refactoring Idea.

2014-04-04 Thread Chia-Hung Lin

+1

On 4 April 2014 16:41, Andronidis Anastasios andronat_...@hotmail.com wrote:
 +1

 Anastasis

 On 4 Απρ 2014, at 8:22 π.μ., Tommaso Teofili tommaso.teof...@gmail.com 
 wrote:

 it sounds reasonable to me, good point.

 Tommaso


 2014-04-04 3:31 GMT+02:00 Edward J. Yoon edwardy...@apache.org:

 All,

 From an user's perspective, we offers too many complex things, so user
 feels difficulty in understanding the how to use of Apache Hama.

 Hence, I propose that we separate the Hama into multi-projects
 (logically). For example:

 * Main portal: http://hama.apache.org/
 * Core BSP framework project: http://hama.apache.org/bsp/
 * Pregel-like Graph framework project: http://hama.apache.org/graph/
 * BSP-based Machine Learning Library project http://hama.apache.org/ml/

 And, for each of projects, we also document how to use separately. .

 What do you think?

 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.

Procedure

2014-03-27 Thread Chia-Hung Lin

As we will have new version for each release, we need someone help
deal with release process. Anyone would like to volunteer on this?

Re: Procedure

2014-03-27 Thread Chia-Hung Lin

Thank for volunteer.



On 27 March 2014 14:50, Edward J. Yoon edwardy...@apache.org wrote:
 I can volunteer to release manager.

 On Thu, Mar 27, 2014 at 3:24 PM, Chia-Hung Lin cli...@googlemail.com wrote:
 As we will have new version for each release, we need someone help
 deal with release process. Anyone would like to volunteer on this?



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.

NPE is thrown when generating javadoc

2014-03-14 Thread Chia-Hung Lin

I am not sure if this is jdk bug (maybe not). Just write down as side note.

When compiling with jdk 6 (java version 1.6.0_25) following
exception is thrown. Switching to jdk 7, this api doc exception goes
away. Came across to find on the interest there is a same problem
report, but no bug database link is provided.

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar (default) on
project hama-core: MavenReportException: Error while creating archive:
[ERROR] Exit code: 1 - java.lang.NullPointerException
[ERROR] at com.sun.tools.javac.jvm.ClassReader.findMethod(ClassReader.java:974)
[ERROR] at 
com.sun.tools.javac.jvm.ClassReader.readEnclosingMethodAttr(ClassReader.java:926)
[ERROR] at 
com.sun.tools.javac.jvm.ClassReader.readMemberAttr(ClassReader.java:909)
[ERROR] at 
com.sun.tools.javac.jvm.ClassReader.readClassAttr(ClassReader.java:1053)
[ERROR] at 
com.sun.tools.javac.jvm.ClassReader.readClassAttrs(ClassReader.java:1067)
[ERROR] at com.sun.tools.javac.jvm.ClassReader.readClass(ClassReader.java:1560)
[ERROR] at 
com.sun.tools.javac.jvm.ClassReader.readClassFile(ClassReader.java:1658)
[ERROR] at com.sun.tools.javac.jvm.ClassReader.fillIn(ClassReader.java:1845)
[ERROR] at com.sun.tools.javac.jvm.ClassReader.complete(ClassReader.java:1777)
[ERROR] at com.sun.tools.javac.code.Symbol.complete(Symbol.java:386)
[ERROR] at com.sun.tools.javac.code.Symbol$ClassSymbol.complete(Symbol.java:763)
[ERROR] at com.sun.tools.javac.code.Symbol$ClassSymbol.flags(Symbol.java:695)
[ERROR] at com.sun.tools.javadoc.ClassDocImpl.getFlags(ClassDocImpl.java:105)
[ERROR] at 
com.sun.tools.javadoc.ClassDocImpl.isAnnotationType(ClassDocImpl.java:116)
[ERROR] at com.sun.tools.javadoc.DocEnv.isAnnotationType(DocEnv.java:574)
[ERROR] at com.sun.tools.javadoc.DocEnv.getClassDoc(DocEnv.java:546)
[ERROR] at 
com.sun.tools.javadoc.PackageDocImpl.getClasses(PackageDocImpl.java:154)
[ERROR] at 
com.sun.tools.javadoc.PackageDocImpl.addAllClassesTo(PackageDocImpl.java:170)
[ERROR] at com.sun.tools.javadoc.RootDocImpl.classes(RootDocImpl.java:178)
[ERROR] at 
com.sun.tools.doclets.internal.toolkit.AbstractDoclet.startGeneration(AbstractDoclet.java:96)
[ERROR] at 
com.sun.tools.doclets.internal.toolkit.AbstractDoclet.start(AbstractDoclet.java:64)
[ERROR] at 
com.sun.tools.doclets.formats.html.HtmlDoclet.start(HtmlDoclet.java:42)
[ERROR] at com.sun.tools.doclets.standard.Standard.start(Standard.java:23)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:597)
[ERROR] at com.sun.tools.javadoc.DocletInvoker.invoke(DocletInvoker.java:269)
[ERROR] at com.sun.tools.javadoc.DocletInvoker.start(DocletInvoker.java:143)
[ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:340)
[ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:128)
[ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:41)
[ERROR] at com.sun.tools.javadoc.Main.main(Main.java:31)
[ERROR]

Re: [DISCUSS] Close all old JIRA tickets and Redesign for future fault tolerant system

2014-03-06 Thread Chia-Hung Lin

I am refactoring the code. If you are interested, it's at
https://github.com/chlin501/hama



On 7 March 2014 10:12, Edward J. Yoon edwardy...@apache.org wrote:
 I'd like to close all old (very inactive) JIRA tickets, and re-design
 for future system.

 If no objections are raised, I'll do next week.

 Thanks!

 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Chia-Hung Lin

BSP is a bridge model that doesn't restrict itself to some particular
usage. My understanding (I could be wrong) is that our framework needs
to address such issue. [1], for example, proposes a solution based on
bsp in the field of real-time application.

[1]. Hartley J.K., Bargiela A., TPML: Parallel meta-language for
scientific and engineering computations using transputers (TPML),
Proc. of 2nd Int. Conf. on Software for Supercomputers and
Multiprocessors, SMS'94, 1994, pp. 22-31

On 4 March 2014 21:20, Yexi Jiang yexiji...@gmail.com wrote:
I am very interested in this topic since my research area includes event
mining, but can BSP conducts the real time computing?

I once used the message queue based solution to collect the event logs.

2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:

[
https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

Edward J. Yoon updated HAMA-883:

Summary: [Research Task] Massive log event aggregation in real time
using Apache Hama (was: [Research Task] Massive log data aggregation in
real time using Apache Hama)

[Research Task] Massive log event aggregation in real time using Apache
Hama

Key: HAMA-883
URL: https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
Issue Type: Task
Reporter: Edward J. Yoon

BSP tasks can be used for aggregating log data streamed in real time.
With this research task, we might able to platformization these kind of
processing.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

--
--
Yexi Jiang,
ECS 251, yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Chia-Hung Lin

I used Twitter Storm previously. Storm is an excellent framework in
real time processing.

Considering Hama in real time tasks, the framework in my opinion need
to decouple io from hdfs so that the source/ input is not restricted
to just hdfs.

On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote:
Please correct me if I'm wrong. My understanding of aggregating the log is
the collect the generated from each monitored machine in real time. The
collecting procedure is continuous like a data stream and never end.

I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate
the logs incrementally each day), but I cannot immediately make up an idea
of using Hama to solve this problem in real time approach.

2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

Aggregators of Graph package are doing similar wok. Monitoring and
Global communication, ..., etc.

On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote:
I am very interested in this topic since my research area includes event
mining, but can BSP conducts the real time computing?

I once used the message queue based solution to collect the event logs.

2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:

[

https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Edward J. Yoon updated HAMA-883:

Summary: [Research Task] Massive log event aggregation in real time
using Apache Hama (was: [Research Task] Massive log data aggregation in
real time using Apache Hama)

[Research Task] Massive log event aggregation in real time using
Apache
Hama

Key: HAMA-883
URL: https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
Issue Type: Task
Reporter: Edward J. Yoon

BSP tasks can be used for aggregating log data streamed in real time.
With this research task, we might able to platformization these kind of
processing.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

--
--
Yexi Jiang,
ECS 251, yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

--
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

--
--
Yexi Jiang,
ECS 251, yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

2014-03-04 Thread Chia-Hung Lin

Below is just my personal viewpoint. We can refactor bsp to be more
modularized so that people can choose if that fits their requirement.
Basically bsp is a generalized model, it may be good if we can create
a flexible framework.

On 5 March 2014 12:25, Edward J. Yoon edwardy...@apache.org wrote:
Why not?

Sent from my iPhone

On 2014. 3. 5., at 오후 1:09, Yexi Jiang yexiji...@gmail.com wrote:

Yes, currently Hama does not support streaming input and streaming output.
That's why currently it is not a natural choice for people with real time
computing needs.

Do we really need to make Hama to support the real time computing? In that
case, we need to compete with Storm...

2014-03-04 22:58 GMT-05:00 Chia-Hung Lin cli...@googlemail.com:

I used Twitter Storm previously. Storm is an excellent framework in
real time processing.

Considering Hama in real time tasks, the framework in my opinion need
to decouple io from hdfs so that the source/ input is not restricted
to just hdfs.

On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote:
Please correct me if I'm wrong. My understanding of aggregating the log
is
the collect the generated from each monitored machine in real time. The
collecting procedure is continuous like a data stream and never end.

I know how to use Hama to aggregate the logs batch by batch (e.g.
aggregate
the logs incrementally each day), but I cannot immediately make up an
idea
of using Hama to solve this problem in real time approach.

2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org:

Aggregators of Graph package are doing similar wok. Monitoring and
Global communication, ..., etc.

On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com
wrote:
I am very interested in this topic since my research area includes
event
mining, but can BSP conducts the real time computing?

I once used the message queue based solution to collect the event
logs.

2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org:

[
https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Edward J. Yoon updated HAMA-883:

Summary: [Research Task] Massive log event aggregation in real
time
using Apache Hama (was: [Research Task] Massive log data
aggregation in
real time using Apache Hama)

[Research Task] Massive log event aggregation in real time using
Apache
Hama

Key: HAMA-883
URL:
https://issues.apache.org/jira/browse/HAMA-883
Project: Hama
Issue Type: Task
Reporter: Edward J. Yoon

BSP tasks can be used for aggregating log data streamed in real
time.
With this research task, we might able to platformization these kind
of
processing.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

--
--
Yexi Jiang,
ECS 251, yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

--
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

--
--
Yexi Jiang,
ECS 251, yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Cutting a 0.7 release

2014-02-24 Thread Chia-Hung Lin

Just let you know I may refactor based on the following diagram.

http://people.apache.org/~chl501/diagram1.png

That sketches the basic flow required for ft. I am currently evaluate
related parts, so it's subjected to change.






On 24 February 2014 20:52, Edward J. Yoon edwardy...@apache.org wrote:
 0.6.4 or 0.7.0, Both are OK to me.

 Just FYI,

 The memory efficiency has been significantly (almost x2-3) improved by
 runtime message serialization and compression. See
 https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
 (I'll attach more benchmarks and comparisons with other systems result
 soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
 SemiClustering, Graph's Combiners HAMA-857.

 According to my personal evaluations, current system is fairly
 respectable. As I mentioned before, I believe we should stick to
 in-memory style since the today's machines can be equipped with up to
 128 GB. Disk (or disk hybrid) based queue is a optional, not a
 must-have.

 Once we release this one, we finally might want to focus on below issues:

 * Fault tolerant job processing (checkpoint recovery)
 * Support GPUs and InfiniBand

 Then, I think we can release version 1.0.

 On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
 Would you cut 0.7 or 0.6.4 ?
 I'd go with 0.6.4 as I think the next minor version change should be due to
 significant feature additions / changes and / or stability / scalability
 improvements.

 Regards,
 Tommaso


 2014-02-24 8:47 GMT+01:00 Edward J. Yoon edwardy...@apache.org:

 Hi all,

 I plan on cutting a release next week. If you have some opinions, Pls feel
 free to comment here.

 Sent from my iPhone



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.

Re: Cutting a 0.7 release

2014-02-24 Thread Chia-Hung Lin

Programmer can't control java memory like malloc/ free in c, type
boxing/ unboxing, etc., it seems not be easy to evaluate the memory.
So it would be good sticking to erlang fail fast style. Or we can have
a programme that load data and measure the actual memory usage.


On 24 February 2014 22:32, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 2014-02-24 13:52 GMT+01:00 Edward J. Yoon edwardy...@apache.org:

 0.6.4 or 0.7.0, Both are OK to me.

 Just FYI,

 The memory efficiency has been significantly (almost x2-3) improved by
 runtime message serialization and compression. See

 https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
 (I'll attach more benchmarks and comparisons with other systems result
 soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
 SemiClustering, Graph's Combiners HAMA-857.


 sure, all the above things look good to me.



 According to my personal evaluations, current system is fairly
 respectable. As I mentioned before, I believe we should stick to
 in-memory style since the today's machines can be equipped with up to
 128 GB. Disk (or disk hybrid) based queue is a optional, not a
 must-have.


 right, the only thing that I think we need to address before 0.7.0 is
 related to the OutOfMemory errors (especially when dealing with large
 graphs); for example IMHO even if the memory is not enough to store all the
 graph vertices assigned to a certain peer, a scalable system should never
 throw OOM exceptions, instead it may eventually process items slower (with
 caches / queues) but never throw an exception for that but that's just my
 opinion.



 Once we release this one, we finally might want to focus on below issues:

 * Fault tolerant job processing (checkpoint recovery)


 +1


 * Support GPUs and InfiniBand


 +1 for the former, not sure about the latter.



 Then, I think we can release version 1.0.


 My 2 cents,
 Tommaso



 On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  Would you cut 0.7 or 0.6.4 ?
  I'd go with 0.6.4 as I think the next minor version change should be due
 to
  significant feature additions / changes and / or stability / scalability
  improvements.
 
  Regards,
  Tommaso
 
 
  2014-02-24 8:47 GMT+01:00 Edward J. Yoon edwardy...@apache.org:
 
  Hi all,
 
  I plan on cutting a release next week. If you have some opinions, Pls
 feel
  free to comment here.
 
  Sent from my iPhone



 --
 Edward J. Yoon (@eddieyoon)
 Chief Executive Officer
 DataSayer, Inc.

Re: New logo on website

2014-01-20 Thread Chia-Hung Lin

+ one hama

On 20 January 2014 20:18, Martin Illecker mar...@illecker.at wrote:
 Nice ;-)


 2014/1/20 Edward J. Yoon edwardy...@apache.org

 Hi,

 http://people.apache.org/~edwardyoon/site/index.html

 Do you like this new logo?

 If no objections arise, I'd like to commit this!

 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems

2014-01-13 Thread Chia-Hung Lin

Not very sure, but it seems JUnitBenchmarks can be integrated to Jekins.

On 13 January 2014 17:05, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 Thanks Song Bai and Ed for your replies, looking forward to Song's
 contributions and HAMA-843/816 to be done.

 Tommaso

 p.s.:
 I think we need a way of continuously benchmarking our trunk (e.g. setup 2+
 machines in distributed mode and run tests / benchmarks against them via
 Jenkins, but I don't know if that's really feasible via ASF Jenkins).



 2014/1/13 Edward J. Yoon edwardy...@apache.org

 Once HAMA-843 is committed, PageRank performance will be dramatically
 improved.

 The scalability issue is related with In-Memory VerticesInfo and
 Queue. DiskVerticesInfo is now available. Disk/Spilling Queue issues
 will be fixed soon.

 And also, Graph package's performance can be improved one more time
 with HAMA-816.

 On Mon, Jan 13, 2014 at 1:14 AM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  by the way: is there anyone aware of what kind of failures were related
 to
  PageRank failures highlighted in the mentioned slides (or know who can we
  ask)?
 
  Tommaso
 
 
  2014/1/10 Edward J. Yoon edwardy...@apache.org
 
  Just FYI,
 
  https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf
 
  --
  Best Regards, Edward J. Yoon
  @eddieyoon
 



 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: Hama Scheduler

2013-12-24 Thread Chia-Hung Lin

BSPMaster makes use of TaskScheduler for scheduling tasks

BSPMaster.java
Class? extends TaskScheduler schedulerClass = conf.getClass(
bsp.master.taskscheduler, SimpleTaskScheduler.class,
TaskScheduler.class);
this.taskScheduler = ReflectionUtils.newInstance(schedulerClass, conf);

Then in SimpleTaskScheduler, tasks are scheduled through schedule
function. And tasks are obtained by related JobInProgress.

That's roughly the execution path.

IIRC the split size is more related to a single job so increasing
split size may not allow more jobs to be ran on the same cluster.

At the moment the scheduling mechanism is done by creating a task per
GroomServer, and each GroomServer allows default maxTasks up to 3. So
increasing maxTasks may be a way to increase job running concurrently;
or restricting tasks scheduled to GroomServer, and then scheduling
tasks in the new job to free slots may also help increase the
concurrent job execution.

Right now scheduling is just a simple FCFS. It's welcome the
improvment on adding something like policy so that the scheduling
mechanism is more flexible.






On 24 December 2013 20:06, Yuesheng Hu yueshen...@gmail.com wrote:
 Hi, I want to implement a scheduler for hama as my thesis project.
 Here are my ideas:
 1. make the split size as big as possible, so more jobs can be run on the
 cluster;
 2. the scheduler can select a job based on the job's type, like graph job
 (message intensive) or iterative job

 Any advice?

 Best Regards!

Re: Hama book

2013-11-26 Thread Chia-Hung Lin

I am happy to help.

On 26 November 2013 22:09, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 looks like a community written book, nice :)

 p.s.:
 I can help as well


 2013/11/26 Yexi Jiang yexiji...@gmail.com

 Me too, that's pretty interesting.


 2013/11/26 Suraj Menon menonsur...@gmail.com

  I can help too. What is the timeline?
 
 
  On Tue, Nov 26, 2013 at 5:32 AM, Anastasis Andronidis 
  andronat_...@hotmail.com wrote:
 
   I am interested for the Graph API if you want.
  
   Anastasis
  
   On 26 Νοε 2013, at 8:02 π.μ., Edward J. Yoon edwardy...@apache.org
   wrote:
  
Anyone?
   
On Thu, Nov 21, 2013 at 8:34 PM, Edward J. Yoon 
 edwardy...@apache.org
  
   wrote:
Hi forks,
   
I talked little with Manning’s Publisher, and started to writing a
  book
   proposal. Comment below if you're interested in being co-author.
   
   
   
--
Best Regards, Edward J. Yoon
@eddieyoon
   
  
  
 



 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Combine all Writables into a new package

2013-10-21 Thread Chia-Hung Lin

+1 for hama-io or hama-commons


On 21 October 2013 21:35, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 what about creating a module for that (Writables and InputFormats for now)
 hama-io / hama-commons that can be used by both (containing math stuff as
 well) ?

 Tommaso


 2013/10/21 Martin Illecker millec...@apache.org

 VectorWritable and MatrixWritable have both some dependencies
 to org.apache.hama.ml.math.
 (DenseDoubleVector, DoubleVector and DenseDoubleMatrix, DoubleMatrix)

 If we move VectorWritable and MatrixWritable to core (e.g.,
 org.apache.hama.io.writable), we have to move org.apache.hama.ml.math as
 well.
 I think that's not possible because of other classes in hama-ml depending
 on ml.math.

 Temporary I will have to copy VectorWritable to the core to use it in a
 test case.


 2013/10/21 Tommaso Teofili tommaso.teof...@gmail.com

  2013/10/21 Martin Illecker millec...@apache.org
 
   Hello,
  
   regarding to my Hama Pipes test case [1], I want to use VectorWritable
   inside the hama-core module.
   Therefore I would need a dependency to hama-ml but this will cause a
  cyclic
   dependency.
  
   So is it possible to move both writables, VectorWritable and
   MatrixWritable, from org.apache.hama.ml.writable into a new package?
   e.g., org.apache.hama.io.writable based on [2]
  
 
  I think this really makes sense.
 
 
  
   Regarding to [3] we can also move TextArrayWritable
   from org.apache.hama.bsp into this new package.
  
   Do you think we can move the writables of org.apache.hama.ml.writable
 to
   the core module?
  
 
  +1
 
 
   And can we do the package refactoring [2] of org.apache.hama.bsp
  submitted
   by Suraj?
  
 
  +1 here too.
 
  Tommaso
 
 
  
   Thanks!
  
   Martin
  
   [1] https://issues.apache.org/jira/browse/HAMA-808
   [2]
  https://issues.apache.org/jira/secure/attachment/12609417/bsplist.txt
   [3] https://issues.apache.org/jira/browse/HAMA-727

Re: TaskStatus question

2013-09-29 Thread Chia-Hung Lin

Yes. In addition to the scheduled state, if any explain on other
states is also appreciated.

Thanks


On 29 September 2013 19:33, Suraj Menon surajsme...@apache.org wrote:
 Are you talking about a SCHEDULED state where the tast is not started yet
 but a directive is sent to Groomserver to start the task?

 -Suraj


 On Sat, Sep 28, 2013 at 7:49 AM, Chia-Hung Lin cli...@googlemail.comwrote:

 TaskStatus has following phase

 STARTING, COMPUTE, BARRIER_SYNC, CLEANUP, RECOVERING

 and state

 RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING,
 FAILED_UNCLEAN, KILLED_UNCLEAN, FAULT_NOTIFIED, RECOVERY_SCHEDULING,
 RECOVERY_SCHEDULED, RECOVERING

 What is the valid mapping or the state transition during a task is
 executed in a GroomServer?

 Thanks

TaskStatus question

2013-09-28 Thread Chia-Hung Lin

TaskStatus has following phase

STARTING, COMPUTE, BARRIER_SYNC, CLEANUP, RECOVERING

and state

RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING,
FAILED_UNCLEAN, KILLED_UNCLEAN, FAULT_NOTIFIED, RECOVERY_SCHEDULING,
RECOVERY_SCHEDULED, RECOVERING

What is the valid mapping or the state transition during a task is
executed in a GroomServer?

Thanks

Re: svn commit: r1523425 - in /hama/trunk/c++: pom.xml src/main/native/pipes/impl/HamaPipes.cc

2013-09-19 Thread Chia-Hung Lin

Apologize for committing the patch to the trunk.

Looks like there is a screen time during which commit actions should
be avoided. In addition to this, any other issue that may also affect
the release? I would like to document this because it is a good
example as a guideline so that when other members perform similar
tasks, the same issue will not be repeated.

Thanks for the explanation.



On 17 September 2013 20:59, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 I'll try to explain why IMHO it's usually better to not commit to trunk
 while voting release candidates (I realize now I did it myself some days
 ago too, sorry :) ).

 When the release manager runs the command 'mvn release:prepare' a bunch of
 things happen, one of them is the current trunk pom.xml files being moved
 to the next development iteration which for us is 0.7.0-SNAPSHOT
 therefore if the release vote doesn't pass and the release has to be rolled
 back, the pom.xml files have to be moved back to their previous versions
 (e.g. 0.6.2-SNAPSHOT) which is done by the release manager via the command
 'mvn release:rollback'

 If someone has committed changes to the trunk this may cause the following:
 1. running mvn release:rollback may fail due to incompatible SVN changes
 (to be merged manually) on pom files (this might be the case of the
 mentioned change on hama-pipes pom.xml)
 2. committed change being silently rolled back and overwritten by 'mvn
 release:rollback'
 3. a snapshot of hama-core 0.7.0-SNAPSHOT containing changes targeted for
 e.g. 0.6.3 being deployed on snapshot-repositories (not a big problem but
 still a bit not consistent)

 Given that I'm of course not against your commit, just it's possible that
 Edward's rollback command will overwrite it, so let's keep in mind we have
 to check that.

 Regards,
 Tommaso



 2013/9/17 Chia-Hung Lin cli...@googlemail.com

 Any reason why this has to be rollback e.g. procedure, format, etc.
 because I would need this patch to be in?

 If it's procedure, format, etc., do we have guideline on wiki?
 Checking wiki such as jekins, HowTOCommit doesn't contain related
 information.

 Thanks

 On 17 September 2013 16:52, Tommaso Teofili tommaso.teof...@gmail.com
 wrote:
  ok, no problem, just let's not commit anything else before Edward con do
  the rollback.
  Tommaso
 
 
  2013/9/17 Edward J. Yoon edwardy...@apache.org
 
  Sorry, I'm on vacation, will be back 2 days later.
 
  --
  Best Regards, Edward J. Yoon
  @eddieyoon
 
  On 2013. 9. 17., at 오후 5:40, Tommaso Teofili tommaso.teof...@gmail.com
 
  wrote:
 
   I think we need Edward to run 'mvn release:rollback' as soon as
 possible
   (as latest vote has been canceled) and then commit this again.
  
   Tommaso
  
  
   2013/9/15 chl...@apache.org
  
   Author: chl501
   Date: Sun Sep 15 10:20:01 2013
   New Revision: 1523425
  
   URL: http://svn.apache.org/r1523425
   Log:
   HAMA-802: Skip Hama Pipes native build when cmake is missing
  
   Modified:
  hama/trunk/c++/pom.xml
  hama/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc
  
   Modified: hama/trunk/c++/pom.xml
   URL:
  
 
 http://svn.apache.org/viewvc/hama/trunk/c%2B%2B/pom.xml?rev=1523425r1=1523424r2=1523425view=diff
  
  
 
 ==
   --- hama/trunk/c++/pom.xml (original)
   +++ hama/trunk/c++/pom.xml Sun Sep 15 10:20:01 2013
   @@ -31,7 +31,7 @@
   descriptionApache Hama Pipes/description
   packagingpom/packaging
  
   -profiles
   +profiles
 profile
   idnative/id
   activation
   @@ -49,16 +49,32 @@
 goalsgoalrun/goal/goals
 configuration
   target
   -  mkdir
 dir=${project.build.directory}/native /
   -  exec executable=cmake
   dir=${project.build.directory}/native failonerror=true
   -arg line=${basedir}/src/
   -DJVM_ARCH_DATA_MODEL=${sun.arch.data.model} /
   -  /exec
   -  exec executable=make
   dir=${project.build.directory}/native failonerror=true
   -arg line=VERBOSE=1 /
   -  /exec
   -  !-- The second make is a workaround for
   HADOOP-9215.  It can
   -   be removed when version 2.6 of cmake is
 no
   longer supported . --
   -  exec executable=make
   dir=${project.build.directory}/native failonerror=true /
   +  taskdef
   resource=net/sf/antcontrib/antcontrib.properties
   classpathref=maven.plugin.classpath /
   +  !-- Check if cmake is installed --
   +  property environment=env /
   +  if
   +or
   +  available file=cmake
  filepath=${env.PATH}
   /
   +  !--  on Windows it can be Path, path

Re: [VOTE] Hama 0.6.3 RC2

2013-09-15 Thread Chia-Hung Lin

-1

Compilation fails with message thrown as below

[exec] /usr/bin/c++-g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-I/tmp/0.6.3-RC2/c++/src/main/native/utils/api
-I/tmp/0.6.3-RC2/c++/src/main/native/pipes/api
-I/tmp/0.6.3-RC2/c++/src-o
CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o -c
/tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc
 [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:
In function ‘void* HamaPipes::ping(void*)’:
 [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16:
error: ‘sleep’ was not declared in this scope
 [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30:
error: ‘close’ was not declared in this scope




On 14 September 2013 03:49, Anastasis Andronidis
andronat_...@hotmail.com wrote:
 +1

 Anastasis

 On 13 Σεπ 2013, at 9:47 π.μ., Edward J. Yoon edwardy...@apache.org wrote:

 Hi,

 I've created RC2 for Hama 0.6.3 release.

 Artifacts and Signatures: 
 http://people.apache.org/~edwardyoon/dist/0.6.3-RC2/

 SVN Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.3-RC2/

 Please try it on both hadoop1 and hadoop2, run the tests, check the doc, etc.

 [ ] +1 Release the packages as Apache Hama 0.6.3
 [ ] -1 Do not release the packages because...

 Thank you!

 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: [CANCELED][VOTE] Hama 0.6.3 RC2

2013-09-15 Thread Chia-Hung Lin

In addition to that, I also encounter a compilation error. I can help
fix this, but just not very sure if that's the right way to do it. If
there is a jira created for these two fixes, I can provide a patch for
compilation issue; and it would be good if anyone can help review it.

[exec] /usr/bin/c++-g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-I/tmp/0.6.3-RC2/c++/src/main/native/utils/api
-I/tmp/0.6.3-RC2/c++/src/main/native/pipes/api
-I/tmp/0.6.3-RC2/c++/src-o
CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o -c
/tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc
 [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:
In function ‘void* HamaPipes::ping(void*)’:
 [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16:
error: ‘sleep’ was not declared in this scope
 [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30:
error: ‘close’ was not declared in this scope



On 14 September 2013 12:06, Edward J. Yoon edwardy...@apache.org wrote:
 Oh,... sorry.

 [ERROR] 
 https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/ws/trunk/yarn/src/main/java/org/apache/hama/bsp/BSPApplicationMaster.java:[64,7]
  error: BSPApplicationMaster is not abstract and does not override abstract 
 method getAssignedPortNum(TaskAttemptID) in BSPPeerProtocol

 I'll create new one after fix above problem (next Sunday).

 On Sat, Sep 14, 2013 at 4:49 AM, Anastasis Andronidis
 andronat_...@hotmail.com wrote:
 +1

 Anastasis

 On 13 Σεπ 2013, at 9:47 π.μ., Edward J. Yoon edwardy...@apache.org wrote:

 Hi,

 I've created RC2 for Hama 0.6.3 release.

 Artifacts and Signatures: 
 http://people.apache.org/~edwardyoon/dist/0.6.3-RC2/

 SVN Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.3-RC2/

 Please try it on both hadoop1 and hadoop2, run the tests, check the doc, 
 etc.

 [ ] +1 Release the packages as Apache Hama 0.6.3
 [ ] -1 Do not release the packages because...

 Thank you!

 --
 Best Regards, Edward J. Yoon
 @eddieyoon





 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: [CANCELED][VOTE] Hama 0.6.3 RC2

2013-09-15 Thread Chia-Hung Lin

Just notice that HAMA-802 should fix c++ compilation error. I've
applied the patch to the trunk, it passes compilation and testing.

On 15 September 2013 16:04, Chia-Hung Lin cli...@googlemail.com wrote:
 In addition to that, I also encounter a compilation error. I can help
 fix this, but just not very sure if that's the right way to do it. If
 there is a jira created for these two fixes, I can provide a patch for
 compilation issue; and it would be good if anyone can help review it.

 [exec] /usr/bin/c++-g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE
 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
 -I/tmp/0.6.3-RC2/c++/src/main/native/utils/api
 -I/tmp/0.6.3-RC2/c++/src/main/native/pipes/api
 -I/tmp/0.6.3-RC2/c++/src-o
 CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o -c
 /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc
  [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:
 In function ‘void* HamaPipes::ping(void*)’:
  [exec] 
 /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16:
 error: ‘sleep’ was not declared in this scope
  [exec] 
 /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30:
 error: ‘close’ was not declared in this scope



 On 14 September 2013 12:06, Edward J. Yoon edwardy...@apache.org wrote:
 Oh,... sorry.

 [ERROR] 
 https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/ws/trunk/yarn/src/main/java/org/apache/hama/bsp/BSPApplicationMaster.java:[64,7]
  error: BSPApplicationMaster is not abstract and does not override 
 abstract method getAssignedPortNum(TaskAttemptID) in BSPPeerProtocol

 I'll create new one after fix above problem (next Sunday).

 On Sat, Sep 14, 2013 at 4:49 AM, Anastasis Andronidis
 andronat_...@hotmail.com wrote:
 +1

 Anastasis

 On 13 Σεπ 2013, at 9:47 π.μ., Edward J. Yoon edwardy...@apache.org wrote:

 Hi,

 I've created RC2 for Hama 0.6.3 release.

 Artifacts and Signatures: 
 http://people.apache.org/~edwardyoon/dist/0.6.3-RC2/

 SVN Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.3-RC2/

 Please try it on both hadoop1 and hadoop2, run the tests, check the doc, 
 etc.

 [ ] +1 Release the packages as Apache Hama 0.6.3
 [ ] -1 Do not release the packages because...

 Thank you!

 --
 Best Regards, Edward J. Yoon
 @eddieyoon





 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: hama pipes build

2013-09-09 Thread Chia-Hung Lin

If that's related to the error with mail titled -

hama-pipes: An Ant BuildException during `make'

Then

1. install cmake
2. an inclusion of `#include unistd.h` to the HamaPipes.cc is needed.







On 9 September 2013 15:39, Anastasis Andronidis
andronat_...@hotmail.com wrote:
 Hello,

 as Martin Illecker explains at his ticket (HAMA-749), there are lots of 
 benefits with cmake: https://issues.apache.org/jira/browse/HADOOP-8368

 I had the some confusion when I tried to build. I think we should test if 
 cmake exists at first place in the beginning of the building, and inform the 
 user to install it, instead of failing with a bogus message.

 Cheers,
 Anastasis

 On 9 Σεπ 2013, at 10:25 π.μ., Tommaso Teofili tommaso.teof...@gmail.com 
 wrote:

 Hi all,

 when I try to build from trunk I get the following while trying to build
 the pipes module:

 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project
 hama-pipes: An Ant BuildException has occured: Execute failed:
 java.io.IOException: Cannot run program cmake (in directory
 /Users/teofili/Documents/workspaces/asf/hama/trunk/c++/target/native):
 error=2, No such file or directory - [Help 1]

 If I understand things correctly I miss the compiler on my computer,
 however it'd be good if we could skip that module if that cmake program is
 missing.

 Regards,
 Tommaso

hama-pipes: An Ant BuildException during `make'

2013-09-08 Thread Chia-Hung Lin

When compiling trunk, following errors are thrown. How can I fix this problem?

Environment: debian, 3.10-2-rt-686-pae, g++ 4.7.3, make 3.8.1

Thanks

...
[exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:
In function ‘void* HamaPipes::ping(void*)’:
 [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16:
error: ‘sleep’ was not declared in this scope
 [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30:
error: ‘close’ was not declared in this scope
 [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:
In function ‘bool HamaPipes::runTask(const HamaPipes::Factory)’:
 [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1280:28:
error: ‘close’ was not declared in this scope
 [exec] make[2]: ***
[CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o] Error
1
 [exec] make[1]: *** [CMakeFiles/hamapipes.dir/all] Error 2
 [exec] make: *** [all] Error 2
 [exec] make[2]: Leaving directory `/path/to/trunk/c++/target/native'
 [exec] make[1]: Leaving directory `/path/to/trunk/c++/target/native'
...
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run
(make) on project hama-pipes: An Ant BuildException has occured: exec
returned: 2

Re: hama-pipes: An Ant BuildException during `make'

2013-09-08 Thread Chia-Hung Lin

Just notice that unistd.h is not included in HamaPipes.cc. Adding
unistd.h to HamaPipes.cc solves the problem, but not sure if this is
the right way to fix it.

Environment:
debian
3.10-2-rt-686-pae
cmake version 2.8.11.2
gcc/ g++ version 4.7.3
make version 3.81



On 8 September 2013 17:39, Martin Illecker mar...@illecker.at wrote:
 Do you have *cmake* installed?


 2013/9/8 Chia-Hung Lin cli...@googlemail.com

 When compiling trunk, following errors are thrown. How can I fix this
 problem?

 Environment: debian, 3.10-2-rt-686-pae, g++ 4.7.3, make 3.8.1

 Thanks

 ...
 [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:
 In function ‘void* HamaPipes::ping(void*)’:
  [exec]
 /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16:
 error: ‘sleep’ was not declared in this scope
  [exec]
 /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30:
 error: ‘close’ was not declared in this scope
  [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:
 In function ‘bool HamaPipes::runTask(const HamaPipes::Factory)’:
  [exec]
 /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1280:28:
 error: ‘close’ was not declared in this scope
  [exec] make[2]: ***
 [CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o] Error
 1
  [exec] make[1]: *** [CMakeFiles/hamapipes.dir/all] Error 2
  [exec] make: *** [all] Error 2
  [exec] make[2]: Leaving directory `/path/to/trunk/c++/target/native'
  [exec] make[1]: Leaving directory `/path/to/trunk/c++/target/native'
 ...
 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
 execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run
 (make) on project hama-pipes: An Ant BuildException has occured: exec
 returned: 2

Re: [DISCUSS] Hama 0.7.0

2013-09-02 Thread Chia-Hung Lin

+1

BTW, are we going to prioritize tasks in roadmap?




On 28 August 2013 14:17, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 sure, it looks reasonable to me.
 Tommaso



 2013/8/28 Edward J. Yoon edwardy...@apache.org

 Hi all,

 After we release the 0.6.3 (HDFS 2.0 version), we have to work for 0.7.0
 version now.

 I would like to suggest that we solve the messaging scalability issue.
 WDYT?

 ...

 And, according to my experiments, BSP framework shows very nice performance
 (I tested also GraphLab and Spark). Only Graph job is slow. So, I'll mainly
 work on improving the performance of GraphJobRunner.

 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: HybridBSP (CPU and GPU) Task Integration

2013-09-02 Thread Chia-Hung Lin

Sorry replying late.

For scheduling, sorry I can't remember the detail now.  IIRC
scheduling is done by dispatching tasks to all GroomServers. The code
in SimpleTaskScheduler.java has a class TaskWorker which dispatches
GroomServerAction to GroomServer through GroomProtocol.

At GroomServer side, a task is launched by launchTaskForJob() where it
calls TaskInProgress's launchTask(); then a seperated process will be
forked for running bsp logic.

For input logic, other developers can provide more insight on how the
data is split. My understanding is that the split mechanism resembles
to MapReduce's as you mentioned by number of tasks; so you might need
to provide input logic that can split corresponded data (80% for gpu
20% for cpu) and the tasks launched can correctly read those input
data according to their type (bsp or gpu). I could be wrong regarding
to input logic mechanism.

The split logic seems to be nearby BSPJobClient.partition() function.

What I can see at the moment is - if launched gpu tasks do not
arbitrary talk (in direct way) to other external gpu processes, which
runs wthin other bsp tasks, it seems that we can treat it as a normal
bsp task without too much modification. But that would be subjected to
how the implementation is done.





On 25 August 2013 22:08, Martin Illecker millec...@apache.org wrote:
 Thanks, your picture [1] illustrates this scenario very good!

 In short I have to modify the runBSP in BSPTask, check if the submitted
 task extends HybridBSP.
 If so, start a PipesBSP server and wait for incoming connections. And run
 the bspGpu method within
 the HybridBSP task.

 Regarding to scheduling:

 1) I have to decide within the runBSP should I execute the bspGpu or
 default bsp method
 of HybridBSP.
 e.g., having numTaskBsp set to 8, Hama will start 8 separate Java threads
 If I set an additional conf numTaskBspGpu property to 1, I want to have 9
 bsp tasks.
 (I don't know where these bsp threads are started. Add property check
 for numTaskBspGpu and start more bsp tasks.)
 8 tasks should execute the default bsp method within runBSP and only one
 task should run bspGpu.

 2) It should be possible to schedule input data for bsp tasks. (belongs to
 the partitioning job)
 e.g, having 8 cpu bsp tasks and 1 gpu bsp task, I wish to have a property
 to control which amount of input belongs to
 which task. Default: Hama's partinioning job will divide the input data
 (e.g., sequence file) by the number of tasks?
 It might happen that e.g., 80% of input data should go to gpu task and only
 20% to cpu tasks.

 By the way do you think a HybridBSP based task which extends BSP will work
 on Hama without any changes.
 Normally it should work because of inheritance of BSP.

 Thanks!

 Martin

 [1] http://i.imgur.com/RP3ETBW.png

 2013/8/24 Chia-Hung Lin cli...@googlemail.com

 It seems to me that an additional process or thread will be launched
 for running a GPU-based bsp task, which will then communicate with
 PipesBSP process, as [1]. Please correct me if it is wrong.

 If this is the case, BSPTask looks like the place to work on. When
 BSPTask process is running, it can check (e.g. in runBSP) if
 additional GPU process/ thread is needed to be created; then launch/
 destroy such task accordingly.

 By the way, it is mentioned that scheduling is needed. Can you please
 give a bit more detail on what kind of scheduling is required?

 [1]. http://i.imgur.com/RP3ETBW.png


 On 24 August 2013 00:59, Martin Illecker mar...@illecker.at wrote:
 
  What's the difference between launching `bsp task' and `gpu bsp task'?
  Will gpu bsp task fork and execute c/ c++ process?
 
 
  The GPU bsp task can also be executed within a Java process.
 
  In detail I want to run a Rootbeer Kernel (e.g., PiEstimationKernel [1])
  within the bspGpu method.
  A Rootbeer Kernel is written in Java and converted to CUDA. (the entry
  point is the gpuMethod)
  Finally there is a Java wrapper around the CUDA code, so it can be
 invoked
  within the JVM.
 
  So far there is no difference between a normal bsp task execution but I
  want to use Hama Pipes to communicate via sockets.
  The GPU bsp task should start like the default one but I will have to
  establish the Pipes Server for communication.
  And of course I need scheduling for theses GPU and CPU tasks.
 
  I hope the following source will illustrate my scenario better:
 
  public class MyHybridBSP extends
   HybridBSPNullWritable, NullWritable, NullWritable, NullWritable,
  Text {
 
 @Override
 public void bsp(BSPPeerNullWritable, NullWritable, NullWritable,
  NullWritable, Text peer)
 throws IOException, SyncException, InterruptedException {
 
   MyGPUKernel kernel = new MyGPUKernel();
   Rootbeer rootbeer = new Rootbeer();
   rootbeer.setThreadConfig(BLOCK_SIZE, GRID_SIZE,
 BLOCK_SIZE*GRID_SIZE);
 
 
   // Run GPU Kernels
   rootbeer.runAll(kernel);
 }
 
 @Override
 public void bspGpu(BSPPeerNullWritable

ML test fails

2013-08-24 Thread Chia-Hung Lin

When testing with the latest source obtained from svn (r1514736), ml
test seems to fail. Is any setting is required? Just to check in case
it's environment specific issue.

---
 T E S T S
---
Running org.apache.hama.ml.ann.TestSmallLayeredNeuralNetwork
13/08/24 17:38:20 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

13/08/24 17:38:29 INFO mortbay.log: Training time: 9.488000s


13/08/24 17:38:29 INFO mortbay.log: Relative error: 20.00%


13/08/24 17:38:20 INFO mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog

13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1

13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1

13/08/24 17:38:20 WARN bsp.BSPJobClient: No job jar file set.  User
classes may not be found. See BSPJob#setJar(String) or check Your jar
file.

13/08/24 17:38:20 INFO bsp.BSPJobClient: Running job: job_localrunner_0001

13/08/24 17:38:20 INFO bsp.LocalBSPRunner: Setting up a new barrier for 1 tasks!

13/08/24 17:38:23 INFO bsp.BSPJobClient: Current supersteps number: 1

13/08/24 17:38:23 INFO bsp.BSPJobClient: The total number of supersteps: 1

13/08/24 17:38:23 INFO bsp.BSPJobClient: Counters: 6

13/08/24 17:38:23 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter

13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEPS=1

13/08/24 17:38:23 INFO bsp.BSPJobClient: LAUNCHED_TASKS=1

13/08/24 17:38:23 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter

13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2

13/08/24 17:38:23 INFO bsp.BSPJobClient: IO_BYTES_READ=57416

13/08/24 17:38:23 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=0

13/08/24 17:38:23 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=618

13/08/24 17:38:23 INFO bsp.FileInputFormat: Total input paths to process : 5

13/08/24 17:38:23 WARN bsp.BSPJobClient: No job jar file set.  User
classes may not be found. See BSPJob#setJar(String) or check Your jar
file.

13/08/24 17:38:23 INFO bsp.BSPJobClient: Running job: job_localrunner_0001

13/08/24 17:38:23 INFO bsp.LocalBSPRunner: Setting up a new barrier for 5 tasks!

13/08/24 17:38:23 INFO mortbay.log: Begin to train

13/08/24 17:38:26 INFO bsp.BSPJobClient: Current supersteps number: 375

13/08/24 17:38:29 INFO bsp.BSPJobClient: Current supersteps number: 849

13/08/24 17:38:32 INFO bsp.BSPJobClient: Current supersteps number: 1377

13/08/24 17:38:35 INFO bsp.BSPJobClient: Current supersteps number: 1877

13/08/24 17:38:38 INFO bsp.BSPJobClient: Current supersteps number: 2435

13/08/24 17:38:41 INFO bsp.BSPJobClient: Current supersteps number: 3001

13/08/24 17:38:44 INFO bsp.BSPJobClient: Current supersteps number: 3573

13/08/24 17:38:46 INFO mortbay.log: End of training, number of iterations: 2001.


13/08/24 17:38:46 INFO mortbay.log: Write model back to /tmp/distributed-model


13/08/24 17:38:47 INFO bsp.BSPJobClient: Current supersteps number: 3999

13/08/24 17:38:47 INFO bsp.BSPJobClient: The total number of supersteps: 3999

13/08/24 17:38:47 INFO bsp.BSPJobClient: Counters: 8

13/08/24 17:38:47 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter

13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEPS=3999

13/08/24 17:38:47 INFO bsp.BSPJobClient: LAUNCHED_TASKS=5

13/08/24 17:38:47 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter

13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2

13/08/24 17:38:47 INFO bsp.BSPJobClient: IO_BYTES_READ=278427240

13/08/24 17:38:47 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=39611

13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=2

13/08/24 17:38:47 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=300

13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=2

13/08/24 17:38:47 INFO mortbay.log: Reload model from /tmp/distributed-model.

13/08/24 17:38:47 INFO mortbay.log: Training time: 27.49s


13/08/24 17:38:47 INFO mortbay.log: Relative error: 24.67%


Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.009
sec  FAILURE!

Re: HybridBSP (CPU and GPU) Task Integration

2013-08-24 Thread Chia-Hung Lin

It seems to me that an additional process or thread will be launched
for running a GPU-based bsp task, which will then communicate with
PipesBSP process, as [1]. Please correct me if it is wrong.

If this is the case, BSPTask looks like the place to work on. When
BSPTask process is running, it can check (e.g. in runBSP) if
additional GPU process/ thread is needed to be created; then launch/
destroy such task accordingly.

By the way, it is mentioned that scheduling is needed. Can you please
give a bit more detail on what kind of scheduling is required?

[1]. http://i.imgur.com/RP3ETBW.png


On 24 August 2013 00:59, Martin Illecker mar...@illecker.at wrote:

 What's the difference between launching `bsp task' and `gpu bsp task'?
 Will gpu bsp task fork and execute c/ c++ process?


 The GPU bsp task can also be executed within a Java process.

 In detail I want to run a Rootbeer Kernel (e.g., PiEstimationKernel [1])
 within the bspGpu method.
 A Rootbeer Kernel is written in Java and converted to CUDA. (the entry
 point is the gpuMethod)
 Finally there is a Java wrapper around the CUDA code, so it can be invoked
 within the JVM.

 So far there is no difference between a normal bsp task execution but I
 want to use Hama Pipes to communicate via sockets.
 The GPU bsp task should start like the default one but I will have to
 establish the Pipes Server for communication.
 And of course I need scheduling for theses GPU and CPU tasks.

 I hope the following source will illustrate my scenario better:

 public class MyHybridBSP extends
  HybridBSPNullWritable, NullWritable, NullWritable, NullWritable,
 Text {

@Override
public void bsp(BSPPeerNullWritable, NullWritable, NullWritable,
 NullWritable, Text peer)
throws IOException, SyncException, InterruptedException {

  MyGPUKernel kernel = new MyGPUKernel();
  Rootbeer rootbeer = new Rootbeer();
  rootbeer.setThreadConfig(BLOCK_SIZE, GRID_SIZE, BLOCK_SIZE*GRID_SIZE);


  // Run GPU Kernels
  rootbeer.runAll(kernel);
}

@Override
public void bspGpu(BSPPeerNullWritable, NullWritable, NullWritable,
 NullWritable, Text peer)
throws IOException, SyncException, InterruptedException {

  // process algorithm on CPU
}

class MyGPUKernel implements Kernel {
  public PiEstimatorKernel() { }

  public void gpuMethod() {
// process algorithm on GPU

// the following commands will need Hama Pipes
HamaPeer.getConfiguration();
HamaPeer.readNext(...,...);
// and others
  }
 }

 Thanks!

 Martin

 [1]
 https://github.com/millecker/applications/blob/master/hama/rootbeer/piestimator/src/at/illecker/hama/rootbeer/examples/piestimator/gpu/PiEstimatorKernel.java

 2013/8/23 Chia-Hung Lin cli...@googlemail.com

 What's the difference between launching `bsp task' and `gpu bsp task'?
 Will gpu bsp task fork and execute c/ c++ process?

 It might be good to distinguish how gpu bsp task will be executed,
 then deciding how to launch such task.

 Basically for launching a bsp task, an external process is created.
 The logic to execute BSP.bsp() is at

 BSPTask.java

 where the method

 runBSP()

 is called with a BSP implementation class loaded at runtime

 Class? workClass = job.getConfiguration().getClass(bsp.work.class,
 BSP.class);

 and then the bsp method is executed

 bsp.bsp(bspPeer);






 On 23 August 2013 21:45, Martin Illecker mar...@illecker.at wrote:
  Hi,
 
  I have created a HybridBSP [1] class which should combine the default BSP
  (CPU) class with GPU methods [2].
 
  The abstract HybridBSP class extends the BSP class and adds bspGpu,
  setupGpu and cleanupGpu method.
 
  public abstract class HybridBSPK1, V1, K2, V2, M extends Writable
 extends
  BSPK1, V1, K2, V2, M implements BSPGpuInterfaceK1, V1, K2, V2, M
 {
 
@Override
public abstract void bspGpu(BSPPeerK1, V1, K2, V2, M peer)
throws IOException, SyncException, InterruptedException;
 
@Override
public void setupGpu(BSPPeerK1, V1, K2, V2, M peer) throws
 IOException,
SyncException, InterruptedException {
}
 
@Override
public void cleanupGpu(BSPPeerK1, V1, K2, V2, M peer) throws
 IOException {
}
  }
 
 
  Now I want to add a new scheduling technique which checks the conf
 property
  (gpuBspTaskNum) and executes the bspGpu instead of default bsp method.
 
  e.g., bspTaskNum=3 and gpuBspTaskNum=1
  The scheduler should run four bsp tasks simultaneously and execute three
  times the bsp method and once the bspGpu. (both defined within one
 derived
  HybridBSP class)
 
  Do I have to modify the taskrunner or create a new SimpleTaskScheduler?
 
  How can I integrate this into Hama?
 
  Thanks!
 
  Martin
 
  [1]
 
 https://github.com/millecker/hama/blob/5d0e8b26abd6b63fa5afad09a2ba960bf9922868/core/src/main/java/org/apache/hama/bsp/gpu/HybridBSP.java
  [2]
 
 https://github.com/millecker/hama/blob

ML test case fails

2013-08-24 Thread Chia-Hung Lin

When testing with the latest source obtained from svn (r1514736), ml
test seems to fail. Is any setting is required? Just to check in case
it's environment specific issue.

---
 T E S T S
---
Running org.apache.hama.ml.ann.TestSmallLayeredNeuralNetwork
13/08/24 17:38:20 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

13/08/24 17:38:29 INFO mortbay.log: Training time: 9.488000s


13/08/24 17:38:29 INFO mortbay.log: Relative error: 20.00%


13/08/24 17:38:20 INFO mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog

13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1

13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1

13/08/24 17:38:20 WARN bsp.BSPJobClient: No job jar file set.  User
classes may not be found. See BSPJob#setJar(String) or check Your jar
file.

13/08/24 17:38:20 INFO bsp.BSPJobClient: Running job: job_localrunner_0001

13/08/24 17:38:20 INFO bsp.LocalBSPRunner: Setting up a new barrier for 1 tasks!

13/08/24 17:38:23 INFO bsp.BSPJobClient: Current supersteps number: 1

13/08/24 17:38:23 INFO bsp.BSPJobClient: The total number of supersteps: 1

13/08/24 17:38:23 INFO bsp.BSPJobClient: Counters: 6

13/08/24 17:38:23 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter

13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEPS=1

13/08/24 17:38:23 INFO bsp.BSPJobClient: LAUNCHED_TASKS=1

13/08/24 17:38:23 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter

13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2

13/08/24 17:38:23 INFO bsp.BSPJobClient: IO_BYTES_READ=57416

13/08/24 17:38:23 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=0

13/08/24 17:38:23 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=618

13/08/24 17:38:23 INFO bsp.FileInputFormat: Total input paths to process : 5

13/08/24 17:38:23 WARN bsp.BSPJobClient: No job jar file set.  User
classes may not be found. See BSPJob#setJar(String) or check Your jar
file.

13/08/24 17:38:23 INFO bsp.BSPJobClient: Running job: job_localrunner_0001

13/08/24 17:38:23 INFO bsp.LocalBSPRunner: Setting up a new barrier for 5 tasks!

13/08/24 17:38:23 INFO mortbay.log: Begin to train

13/08/24 17:38:26 INFO bsp.BSPJobClient: Current supersteps number: 375

13/08/24 17:38:29 INFO bsp.BSPJobClient: Current supersteps number: 849

13/08/24 17:38:32 INFO bsp.BSPJobClient: Current supersteps number: 1377

13/08/24 17:38:35 INFO bsp.BSPJobClient: Current supersteps number: 1877

13/08/24 17:38:38 INFO bsp.BSPJobClient: Current supersteps number: 2435

13/08/24 17:38:41 INFO bsp.BSPJobClient: Current supersteps number: 3001

13/08/24 17:38:44 INFO bsp.BSPJobClient: Current supersteps number: 3573

13/08/24 17:38:46 INFO mortbay.log: End of training, number of iterations: 2001.


13/08/24 17:38:46 INFO mortbay.log: Write model back to /tmp/distributed-model


13/08/24 17:38:47 INFO bsp.BSPJobClient: Current supersteps number: 3999

13/08/24 17:38:47 INFO bsp.BSPJobClient: The total number of supersteps: 3999

13/08/24 17:38:47 INFO bsp.BSPJobClient: Counters: 8

13/08/24 17:38:47 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter

13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEPS=3999

13/08/24 17:38:47 INFO bsp.BSPJobClient: LAUNCHED_TASKS=5

13/08/24 17:38:47 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter

13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2

13/08/24 17:38:47 INFO bsp.BSPJobClient: IO_BYTES_READ=278427240

13/08/24 17:38:47 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=39611

13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=2

13/08/24 17:38:47 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=300

13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=2

13/08/24 17:38:47 INFO mortbay.log: Reload model from /tmp/distributed-model.

13/08/24 17:38:47 INFO mortbay.log: Training time: 27.49s


13/08/24 17:38:47 INFO mortbay.log: Relative error: 24.67%


Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.009
sec  FAILURE!

Re: HybridBSP (CPU and GPU) Task Integration

2013-08-23 Thread Chia-Hung Lin

What's the difference between launching `bsp task' and `gpu bsp task'?
Will gpu bsp task fork and execute c/ c++ process?

It might be good to distinguish how gpu bsp task will be executed,
then deciding how to launch such task.

Basically for launching a bsp task, an external process is created.
The logic to execute BSP.bsp() is at

BSPTask.java

where the method

runBSP()

is called with a BSP implementation class loaded at runtime

Class? workClass = job.getConfiguration().getClass(bsp.work.class,
BSP.class);

and then the bsp method is executed

bsp.bsp(bspPeer);






On 23 August 2013 21:45, Martin Illecker mar...@illecker.at wrote:
 Hi,

 I have created a HybridBSP [1] class which should combine the default BSP
 (CPU) class with GPU methods [2].

 The abstract HybridBSP class extends the BSP class and adds bspGpu,
 setupGpu and cleanupGpu method.

 public abstract class HybridBSPK1, V1, K2, V2, M extends Writable extends
 BSPK1, V1, K2, V2, M implements BSPGpuInterfaceK1, V1, K2, V2, M {

   @Override
   public abstract void bspGpu(BSPPeerK1, V1, K2, V2, M peer)
   throws IOException, SyncException, InterruptedException;

   @Override
   public void setupGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException,
   SyncException, InterruptedException {
   }

   @Override
   public void cleanupGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException {
   }
 }


 Now I want to add a new scheduling technique which checks the conf property
 (gpuBspTaskNum) and executes the bspGpu instead of default bsp method.

 e.g., bspTaskNum=3 and gpuBspTaskNum=1
 The scheduler should run four bsp tasks simultaneously and execute three
 times the bsp method and once the bspGpu. (both defined within one derived
 HybridBSP class)

 Do I have to modify the taskrunner or create a new SimpleTaskScheduler?

 How can I integrate this into Hama?

 Thanks!

 Martin

 [1]
 https://github.com/millecker/hama/blob/5d0e8b26abd6b63fa5afad09a2ba960bf9922868/core/src/main/java/org/apache/hama/bsp/gpu/HybridBSP.java
 [2]
 https://github.com/millecker/hama/blob/5d0e8b26abd6b63fa5afad09a2ba960bf9922868/core/src/main/java/org/apache/hama/bsp/gpu/BSPGpuInterface.java

Re: [VOTE] Skip minor release, and prepare 1.0

2013-08-18 Thread Chia-Hung Lin

+0

Personally I would not go for 1.0 now though the  release for 1.0 is
ok for me. My reason is people may expect functions such as FT to be
ready when it's in the version 1.0. Also it might be inevitably that
people would compare MRv2, Giraph to Hama; and would think that MRv2,
and Giraph would be more better/ stable than Hama because of FT, etc.
regardless of differences between projects.







On 17 August 2013 16:33, Edward J. Yoon edwardy...@apache.org wrote:
 Hi all,

 I was planning to cut a 0.6.3 release candidate (Hadoop 2.0 compatible
 version), however it seems the age of compete for the preoccupancy is
 past. So we don't need to hurry up now. Moreover, we are currently
 adding a lot of changes, and still need to be improved a lot. We knows
 what we should do exactly.

 Do you think we can skip minor release and prepare 1.0 now?

 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: Discussion about guideline

2013-08-08 Thread Chia-Hung Lin

It seems that our release requirement fits into continuous integration
where software is frequently released.

In current workflow, though we can decide what to be included and
released in the roadmap. The release may still be deferred even 90% of
issues are solved because the rest 10% is still not yet accomplished.
With continuous integration (IIRC) the software is released frequently
when a new feature is resolved, resulting in the reduced release
cycle, and that problems can be solved a bit easier because change is
small.

If this is the case, it might be good for us adapting to CI; and we
can schedule what (relative small) patches to be released in order.






On 6 August 2013 09:16, Edward J. Yoon edwardy...@apache.org wrote:
 I mean, we can move most current issues to 0.7 and start to comply
 with our new development-process.

 On Tue, Aug 6, 2013 at 10:07 AM, Edward J. Yoon edwardy...@apache.org wrote:
 I personally would like to cut a release 0.6.3 after solve the
 HAMA-789 and HAMA-717 issues. Because, there are people who want to
 run Hama cluster on Hadoop 2.0 environment like me.

 I think we can move rest current issues into 0.7 roadmap. As we know,
 the only critical issues in core BSP project, are now memory
 efficiency and FT system. And, BSP-based ML algorithm library, query
 language projects can began in earnest.

 On Tue, Aug 6, 2013 at 9:48 AM, Yexi Jiang yexiji...@gmail.com wrote:
 How about the current in-progress issues?


 2013/8/5 Edward J. Yoon edwardy...@apache.org

  First, each release, or between different releases, would have tasks
  included. Among tasks there might have priority between those tasks or
  a task may block one or more tasks. So how do we determine priority of
  tasks or that between several releases? An naive thought is by voting;
  however, issues may not be clear for every participants. In that case,
  voting may defer more important tasks.

 I think we can follow current guide line.

 Every ideas for improvements, new features, and suggestions are
 recommended to be discussed in polite terms before implementation on
 the dev@ list, and then its decisions must be listed on our RoadMap
 page. In simple improvement or bug type issues, you can skip the
 discussion and report directly on JIRA.

 And then, we can cut a release according to the Roadmap.

  a.) when a patch is submitted, at least 2 reviewers should help review
  source code.
  b.) the patch creator should describe e.g. execution flow/ procedure
  in a higher/ conceptual level.
 
  Reviewers then can cooperate review parts of the code in patch (tool
  may help in this stage). Some review points such as (java) doc and
  test cases should be included.
 
  - Test cases
  Each patch should have test cases that at least capture the main
  logical flow. And the tests is recommended not to bound to external
  dependencies so that time spent on testing can be reduced.
 
  - Doc (Java doc or wiki)
  Class should at least describe what it is, or its main logic flow. Or
  at lease write down the mechanism in wiki. Method fields that is not
  self-explanatory would be good to have doc explaining its purpose or
  its execution mechanism.

 +1

 On Mon, Aug 5, 2013 at 11:33 PM, Chia-Hung Lin cli...@googlemail.com
 wrote:
  As hama community grows, it seems that it is good to have a guideline
  so that participants can follow and cooperate more smoothly. Therefore
  I would like to discuss about this, and please share your opinions so
  that we can improve the process.
 
  Below are some issues popping up on my head.
  - roadmap prioritization
  - development work flow
 
  First, each release, or between different releases, would have tasks
  included. Among tasks there might have priority between those tasks or
  a task may block one or more tasks. So how do we determine priority of
  tasks or that between several releases? An naive thought is by voting;
  however, issues may not be clear for every participants. In that case,
  voting may defer more important tasks.
 
  Second, a few subtopics are listed as below:
 
  - Code review
  Though a commit section is described, it is not clear how the
  procedure will be practised. My thought is
 
  a.) when a patch is submitted, at least 2 reviewers should help review
  source code.
  b.) the patch creator should describe e.g. execution flow/ procedure
  in a higher/ conceptual level.
 
  Reviewers then can cooperate review parts of the code in patch (tool
  may help in this stage). Some review points such as (java) doc and
  test cases should be included.
 
  - Test cases
  Each patch should have test cases that at least capture the main
  logical flow. And the tests is recommended not to bound to external
  dependencies so that time spent on testing can be reduced.
 
  - Doc (Java doc or wiki)
  Class should at least describe what it is, or its main logic flow. Or
  at lease write down the mechanism in wiki. Method fields that is not
  self-explanatory would

Discussion about guideline

2013-08-05 Thread Chia-Hung Lin

As hama community grows, it seems that it is good to have a guideline
so that participants can follow and cooperate more smoothly. Therefore
I would like to discuss about this, and please share your opinions so
that we can improve the process.

Below are some issues popping up on my head.
- roadmap prioritization
- development work flow

First, each release, or between different releases, would have tasks
included. Among tasks there might have priority between those tasks or
a task may block one or more tasks. So how do we determine priority of
tasks or that between several releases? An naive thought is by voting;
however, issues may not be clear for every participants. In that case,
voting may defer more important tasks.

Second, a few subtopics are listed as below:

- Code review
Though a commit section is described, it is not clear how the
procedure will be practised. My thought is

a.) when a patch is submitted, at least 2 reviewers should help review
source code.
b.) the patch creator should describe e.g. execution flow/ procedure
in a higher/ conceptual level.

Reviewers then can cooperate review parts of the code in patch (tool
may help in this stage). Some review points such as (java) doc and
test cases should be included.

- Test cases
Each patch should have test cases that at least capture the main
logical flow. And the tests is recommended not to bound to external
dependencies so that time spent on testing can be reduced.

- Doc (Java doc or wiki)
Class should at least describe what it is, or its main logic flow. Or
at lease write down the mechanism in wiki. Method fields that is not
self-explanatory would be good to have doc explaining its purpose or
its execution mechanism.

Just some ideas I have at the moment. Will add more if I find others.
And we should keep improving it when necessary. Please add your points
if you think some are missing, or remove some that is not needed.

[1]. How to commit - Review https://wiki.apache.org/hama/HowToCommit#Review

Hama wiki

2013-08-04 Thread Chia-Hung Lin

Seems can't edit wiki right now. Anyone can help check this issue.

Thanks

Re: [DISCUSS] Roadmap for 0.7.0

2013-07-24 Thread Chia-Hung Lin

I will now set

- exporting more metrics
- master notification

tasks with higher priority.




On 22 July 2013 02:32, Yexi Jiang yexiji...@gmail.com wrote:
 Hi, Tommaso,

 For the machine learning module, what kind of refactoring do you think is
 necessary?

 Regards,
 Yexi


 2013/7/21 Edward J. Yoon edwardy...@apache.org

 Additionally, Queue is also one of big issues.

 On Sun, Jul 21, 2013 at 8:55 PM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  Hi Edward,
 
  I'm still quite unsure about the status of FT so it may be worth doing
 some
  work to make sure that is fully working (but it may be just me).
  Also vertex storage in graph package should be improved.
  Then I'd say some refactoring of machine learning module APIs together
 with
  addition of Collaborative Filtering  (and eventually some other
  algorithms, but I'm still unsure there).
 
  My 2 cents,
  Tommaso
 
 
 
  2013/7/19 Edward J. Yoon edwardy...@apache.org
 
  Hi all,
 
  Once HAMA-742 is done, users will be able to install a Hama cluster on
  existing Hadoop 1.x and new Hadoop 2.x without issues.
 
  I think urgent tasks are finished, now it's time to discuss about the
  future roadmap Hama 0.7 and begin enhancement work.
 
  Please feel free to voice your opinions.
 
  Thanks.
 
  --
  Best Regards, Edward J. Yoon
  @eddieyoon
 



 --
 Best Regards, Edward J. Yoon
 @eddieyoon




 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Dynamic vertices and hama counters

2013-07-18 Thread Chia-Hung Lin

Sorry my bad. Only focused on counter stuff. Didn't pay attention to
Vertex related issue. Thought that just want to share counter value
between peers. In that case persisting counter value to zk shouldn't
be a problem, and won't incur overhead. But if the case is not about
counter, please just ignore my previous post.


On 17 July 2013 06:59, Edward J. Yoon edwardy...@apache.org wrote:
 You guys seems totally misunderstood what I am saying.

 Every BSP processor accesses to ZK's counter concurrently? Do you
 think it is possible to determine the current total number of vertices
 in every step without barrier synchronization?

 As I mentioned before, there is already additional barrier
 synchronization steps for aggregating and broadcasting global updated
 vertex count. You can use this steps without *no additional barrier
 synchronization*.

 On Wed, Jul 17, 2013 at 5:01 AM, andronat_asf andronat_...@hotmail.com 
 wrote:
 Thank you everyone,

 +1 for Tommaso, I will see what I can do about that :)

 I also believe that ZK is very similar sync() mechanism that Edward is 
 saying, but if we need to sync more info we might need ZK.

 Thanks again,
 Anastasis

 On 15 Ιουλ 2013, at 5:55 μ.μ., Edward J. Yoon edwardy...@apache.org wrote:

 andronat_asf,

 To aggregate and broadcast the global count of updated vertices, we
 calls sync() twice. See the doAggregationUpdates() method in
 GraphJobRunner. You can solve your problem the same way, and there
 will be no additional cost.

 Use of Zookeeper is not bad idea. But IMO, it's not much different
 with sync() mechanism.

 On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin cli...@googlemail.com 
 wrote:
 +1 for Tommaso's solution.

 If not every algorithm needs counter service, having an interface with
 different implementations (in-memory, zk, etc.) should reduce the side
 effect.


 On 15 July 2013 15:51, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 what about introducing a proper API for counting vertices, something like
 an interface VertexCounter with 2-3 implementations like
 InMemoryVertexCounter (basically the current one), a
 DistributedVertexCounter to implement the scenario where we use a separate
 BSP superstep to count them and a ZKVertexCounter which handles vertices
 counts as per Chian-Hung's suggestion.

 Also we may introduce something like a configuration variable to define if
 all the vertices are needed or just the neighbors (and/or some other
 strategy).

 My 2 cents,
 Tommaso

 2013/7/14 Chia-Hung Lin cli...@googlemail.com

 Just my personal viewpoint. For small size of global information,
 considering to store the state in ZooKeeper might be a reasonable
 solution.

 On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote:
 Hello everyone,

 I'm working on HAMA-767 and I have some concerns on counters and
 scalability. Currently, every peer has a set of vertices and a variable
 that is keeping the total number of vertices through all peers. In my 
 case,
 I'm trying to add and remove vertices during the runtime of a job, which
 means that I have to update all those variables.

 My problem is that this is not efficient because in every operation (add
 or remove a vertex) I need to update all peers, so I need to send lots of
 messages to make those updates (see GraphJobRunner#countGlobalVertexCount
 method) and I believe this is not correct and scalable. An other problem 
 is
 that, even if I update all those variable (with the cost of sending lots 
 of
 messages to every peer) those variables will be updated on the next
 superstep.

 e.g.:

 Peer 1:Peer 2:
  Vert_1  Vert_2
 (Total_V = 2)  (Total_V = 2)
 addVertex()
 (Total_V = 3)
 getNumberOfV() = 2

  Sync 

 getNumberOfV() = 3


 Is there something like global counters or shared memory that it can
 address this issue?

 P.S. I have a small feeling that we don't need to track the total amount
 of vertices because vertex centered algorithms rarely need total numbers,
 they only depend on neighbors (I might be wrong though).

 Thanks,
 Anastasis




 --
 Best Regards, Edward J. Yoon
 @eddieyoon





 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: Dynamic vertices and hama counters

2013-07-15 Thread Chia-Hung Lin

+1 for Tommaso's solution.

If not every algorithm needs counter service, having an interface with
different implementations (in-memory, zk, etc.) should reduce the side
effect.


On 15 July 2013 15:51, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 what about introducing a proper API for counting vertices, something like
 an interface VertexCounter with 2-3 implementations like
 InMemoryVertexCounter (basically the current one), a
 DistributedVertexCounter to implement the scenario where we use a separate
 BSP superstep to count them and a ZKVertexCounter which handles vertices
 counts as per Chian-Hung's suggestion.

 Also we may introduce something like a configuration variable to define if
 all the vertices are needed or just the neighbors (and/or some other
 strategy).

 My 2 cents,
 Tommaso

 2013/7/14 Chia-Hung Lin cli...@googlemail.com

 Just my personal viewpoint. For small size of global information,
 considering to store the state in ZooKeeper might be a reasonable
 solution.

 On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote:
  Hello everyone,
 
  I'm working on HAMA-767 and I have some concerns on counters and
 scalability. Currently, every peer has a set of vertices and a variable
 that is keeping the total number of vertices through all peers. In my case,
 I'm trying to add and remove vertices during the runtime of a job, which
 means that I have to update all those variables.
 
  My problem is that this is not efficient because in every operation (add
 or remove a vertex) I need to update all peers, so I need to send lots of
 messages to make those updates (see GraphJobRunner#countGlobalVertexCount
 method) and I believe this is not correct and scalable. An other problem is
 that, even if I update all those variable (with the cost of sending lots of
 messages to every peer) those variables will be updated on the next
 superstep.
 
  e.g.:
 
  Peer 1:Peer 2:
Vert_1  Vert_2
  (Total_V = 2)  (Total_V = 2)
  addVertex()
  (Total_V = 3)
   getNumberOfV() = 2
 
   Sync 
 
   getNumberOfV() = 3
 
 
  Is there something like global counters or shared memory that it can
 address this issue?
 
  P.S. I have a small feeling that we don't need to track the total amount
 of vertices because vertex centered algorithms rarely need total numbers,
 they only depend on neighbors (I might be wrong though).
 
  Thanks,
  Anastasis

Re: Display math to wiki

2013-06-21 Thread Chia-Hung Lin

sciweavers[1] provides a tool converting tex equation to image.

[1]. http://www.sciweavers.org/free-online-latex-equation-editor


On 20 June 2013 20:28, Yexi Jiang yexiji...@gmail.com wrote:
 OK, I will use image instead.


 2013/6/20 Edward J. Yoon edwardy...@apache.org

 You can post your request on Apache Infrastructure
 infrastruct...@apache.org.

 However, I would recommend just attaching/embedding images directly to
 the Wiki pages. It requires MathML enabled browser and fonts.

 On Thu, Jun 20, 2013 at 12:02 PM, Yexi Jiang yexiji...@gmail.com wrote:
  How to tell them about this?
 
 
 
  2013/6/19 Edward edw...@udanax.org
 
  We should contact the ASF Infra team.
 
  Sent from my iPhone.
 
  Jun 20, 2013 10:16 AM Yexi Jiang yexiji...@gmail.com 작성:
 
   I found one, http://moinmo.in/MathMlSupport.
  
   But it seems that it requires to modify the files wikiconfig.py and
   wikiutil.py.
  
  
  
   2013/6/19 Edward J. Yoon edwardy...@apache.org
  
   I just found this http://moinmo.in/FeatureRequests/MathExpression
  
   On Thu, Jun 20, 2013 at 9:32 AM, Yexi Jiang yexiji...@gmail.com
  wrote:
   Hi,
  
   Does anyone know how to input and display the math equation on the
 wiki
   page? It is better if the syntax is compatible to the latex.
  
   Regards,
   Yexi
  
   --
  
  
  
   --
   Best Regards, Edward J. Yoon
   @eddieyoon
  
  
  
   --
   --
   Yexi Jiang,
   ECS 251,  yjian...@cs.fiu.edu
   School of Computer and Information Science,
   Florida International University
   Homepage: http://users.cis.fiu.edu/~yjian004/
 
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/



 --
 Best Regards, Edward J. Yoon
 @eddieyoon




 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/

[VOTE] GIT migration

2013-04-05 Thread Chia-Hung Lin

Hi,

This is a formal vote for/ against migrating from svn to git regarding
to Apache Hama's repository.

+1 : Migrate to Git
+0 : Abstain
-1  : Stay on SVN

Re: [VOTE] Release Hama 0.6.1

2013-03-31 Thread Chia-Hung Lin

I don't have free nodes at hand. Thus only test building from source. The
build succeeds on my personal laptop. I am ok with that. +1

[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hama parent POM  SUCCESS [3.219s]
[INFO] core .. SUCCESS
[3:23.415s]
[INFO] graph . SUCCESS [39.063s]
[INFO] machine learning .. SUCCESS [6.820s]
[INFO] examples .. SUCCESS
[1:09.465s]
[INFO] yarn .. SUCCESS [3.524s]
[INFO] hama-dist . SUCCESS [7.358s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 5:33.308s
[INFO] Finished at: Sun Mar 31 22:46:54 CST 2013
[INFO] Final Memory: 27M/309M
[INFO]




On 31 March 2013 12:22, Edward J. Yoon edwardy...@apache.org wrote:

 I've tested this RC on 4 nodes cluster. Everything looks good to me.

 +1

 On Fri, Mar 29, 2013 at 6:00 PM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:
  +1
 
  Tommaso
 
 
  2013/3/29 Edward J. Yoon edwardy...@apache.org
 
  Hello all,
 
  I've created a Hama 0.6.1-RC1. As we discussed yesterday, this RC
  fixes input partitioning issue and adds few random data generators.
 
  Please check whether this is stable enough for newbies to start Hama
  on their cluster, and vote!
 
  Hama 0.6.1 RC1:
  http://people.apache.org/~edwardyoon/dist/0.6.1-RC1/
 
  Tags:
  http://svn.apache.org/repos/asf/hama/tags/0.6.1-RC1/
 
  Thanks.
 
  --
  Best Regards, Edward J. Yoon
  @eddieyoon
 



 --
 Best Regards, Edward J. Yoon
 @eddieyoon

Re: Welcome Apurv Verma as new Apache Hama committer

2012-06-19 Thread Chia-Hung Lin

Congratulation, Apurv!

On 19 June 2012 23:05, Praveen Sripati praveensrip...@gmail.com wrote:
 Congrats Apurv - Praveen

 On Tue, Jun 19, 2012 at 8:14 PM, Tommaso Teofili
 tommaso.teof...@gmail.comwrote:

 Dear all,

 please join me in welcoming Apurv Verma as a new committer in the Apache
 Hama project.
 He's given valuable contributions to the project, he's the first new
 committer joining Hama as a TLP and we're happy he joined the team :-)
 Apurv, if you don't mind it'd be nice if you can spend a few words
 presenting yourself (it's an old Lucene tradition which I think would nice
 to pull here too).

 Welcome on board Apurv.
 Kind regards,
 Tommaso

88 matches

Mail list logo