Re: Project healthy question

2019-02-26 Thread Edward J. Yoon
Obviously inactive :/ By the way, I personally seeing that many of apache
projects are going into inactive state. especially big data related
projects.


2019년 2월 25일 (월) 오전 12:19, Chia-Hung Lin 님이
작성:

> As you may notice that the activity is low for a period of time. This
> raises an issue when doing board project with respect to the project
> healthy. So it's greatly appreciated if anyone has inputs or comments
> regarding to this.
>


Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!

2019-01-30 Thread Edward J. Yoon
Thanks! I just created a ticket for migration
https://issues.apache.org/jira/browse/INFRA-17788

On Thu, Jan 31, 2019 at 8:03 AM Anastasis Andronidis
 wrote:
>
> +1
>
> Anastasios
>
> > On 30 Jan 2019, at 23:03, Martin Illecker  wrote:
> >
> > +1
> >
> > Am Mi., 30. Jan. 2019 um 23:42 Uhr schrieb Chia-Hung Lin
> > :
> >
> >> +1
> >>
> >> On Wed, 30 Jan 2019 at 12:47, Júlio Pires  wrote:
> >>
> >>> I'm +1 too.
> >>>
> >>> Em qua, 30 de jan de 2019 às 06:35, Edward J. Yoon <
> >> edwardy...@apache.org>
> >>> escreveu:
> >>>
> >>>> P.S., Please vote on here. We need to make a consensus.
> >>>>
> >>>> I'm +1.
> >>>>
> >>>> On Wed, Jan 30, 2019 at 5:33 PM Edward J. Yoon 
> >>>> wrote:
> >>>>>
> >>>>> Hi devs,
> >>>>> I propose we ask ASF Infra to move the Hama Git repo to GitBox as
> >> soon
> >>> as
> >>>>> the release has been finalized / announced. Once they switch things
> >>> over,
> >>>>> we can update the web site / documentation to reflect that.
> >>>>>
> >>>>> Does anyone see any problems with this approach?
> >>>>> If there's no objections, I'll create a jira ticket.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> On Thu, Jan 17, 2019 at 4:20 AM Chia-Hung Lin
> >>>>>  wrote:
> >>>>>>
> >>>>>> Hi Edward, thanks for help!
> >>>>>>
> >>>>>> On Tue, 15 Jan 2019 at 13:10, Edward J. Yoon <
> >> edwardy...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>>> I can check tomorrow!
> >>>>>>>
> >>>>>>> 2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team <
> >>>>>>> infrastruct...@apache.org>님이 작성:
> >>>>>>>
> >>>>>>>> Hello, hama folks.
> >>>>>>>> As stated earlier in 2018, and reiterated two weeks ago, all
> >> git
> >>>>>>>> repositories must be migrated from the git-wip-us.apache.org
> >> URL
> >>>> to
> >>>>>>>> gitbox.apache.org, as the old service is being decommissioned.
> >>>> Your
> >>>>>>>> project is receiving this email because you still have
> >>>> repositories on
> >>>>>>>> git-wip-us that needs to be migrated.
> >>>>>>>>
> >>>>>>>> The following repositories on git-wip-us belong to your
> >> project:
> >>>>>>>> - hama.git
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> We are now entering the remaining three weeks of the mandated
> >>>>>>>> (coordinated) move stage of the roadmap, and you are asked to
> >>>> please
> >>>>>>>> coordinate migration with the Apache Infrastructure Team before
> >>>> February
> >>>>>>>> 7th. All repositories not migrated on February 7th will be mass
> >>>> migrated
> >>>>>>>> without warning, and we'd appreciate it if we could work
> >> together
> >>>> to
> >>>>>>>> avoid a big mess that day :-).
> >>>>>>>>
> >>>>>>>> As stated earlier, moving to gitbox means you will get full
> >> write
> >>>> access
> >>>>>>>> on GitHub as well, and be able to close/merge pull requests and
> >>>> much
> >>>>>>>> more. The move is mandatory for all Apache projects using git.
> >>>>>>>>
> >>>>>>>> To have your repositories moved, please follow these steps:
> >>>>>>>>
> >>>>>>>> - Ensure consensus on the move (a link to a lists.apache.org
> >>>> thread will
> >>>>>>>>  suffice for us as evidence).
> >>>>>>>> - Create a JIRA ticket at
> >>>> https://issues.apache.org/jira/browse/INFRA
> >>>>>>>>
> >>>>>>>> Your migration should only take a few minutes. If you wish to
> >>>> migrate
> >>>>>>>> at a specific time of day or date, please do let us know in the
> >>>> ticket,
> >>>>>>>> otherwise we will migrate at the earliest convenient time.
> >>>>>>>>
> >>>>>>>> There will be redirects in place from git-wip to gitbox, so
> >>>> requests
> >>>>>>>> using the old remote origins should still work (however we
> >>>> encourage
> >>>>>>>> people to update their remotes once migration has completed).
> >>>>>>>>
> >>>>>>>> As always, we appreciate your understanding and patience as we
> >>> move
> >>>>>>>> things around and work to provide better services and features
> >>> for
> >>>>>>>> the Apache Family.
> >>>>>>>>
> >>>>>>>> Should you wish to contact us with feedback or questions,
> >> please
> >>>> do so
> >>>>>>>> at: us...@infra.apache.org.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> With regards,
> >>>>>>>> Apache Infrastructure
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards, Edward J. Yoon
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>>
> >>>
> >>
>


-- 
Best Regards, Edward J. Yoon


Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!

2019-01-30 Thread Edward J. Yoon
P.S., Please vote on here. We need to make a consensus.

I'm +1.

On Wed, Jan 30, 2019 at 5:33 PM Edward J. Yoon  wrote:
>
> Hi devs,
> I propose we ask ASF Infra to move the Hama Git repo to GitBox as soon as
> the release has been finalized / announced. Once they switch things over,
> we can update the web site / documentation to reflect that.
>
> Does anyone see any problems with this approach?
> If there's no objections, I'll create a jira ticket.
>
> Thanks.
>
> On Thu, Jan 17, 2019 at 4:20 AM Chia-Hung Lin
>  wrote:
> >
> > Hi Edward, thanks for help!
> >
> > On Tue, 15 Jan 2019 at 13:10, Edward J. Yoon  wrote:
> >
> > >  I can check tomorrow!
> > >
> > > 2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team <
> > > infrastruct...@apache.org>님이 작성:
> > >
> > > > Hello, hama folks.
> > > > As stated earlier in 2018, and reiterated two weeks ago, all git
> > > > repositories must be migrated from the git-wip-us.apache.org URL to
> > > > gitbox.apache.org, as the old service is being decommissioned. Your
> > > > project is receiving this email because you still have repositories on
> > > > git-wip-us that needs to be migrated.
> > > >
> > > > The following repositories on git-wip-us belong to your project:
> > > >  - hama.git
> > > >
> > > >
> > > > We are now entering the remaining three weeks of the mandated
> > > > (coordinated) move stage of the roadmap, and you are asked to please
> > > > coordinate migration with the Apache Infrastructure Team before February
> > > > 7th. All repositories not migrated on February 7th will be mass migrated
> > > > without warning, and we'd appreciate it if we could work together to
> > > > avoid a big mess that day :-).
> > > >
> > > > As stated earlier, moving to gitbox means you will get full write access
> > > > on GitHub as well, and be able to close/merge pull requests and much
> > > > more. The move is mandatory for all Apache projects using git.
> > > >
> > > > To have your repositories moved, please follow these steps:
> > > >
> > > > - Ensure consensus on the move (a link to a lists.apache.org thread will
> > > >   suffice for us as evidence).
> > > > - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
> > > >
> > > > Your migration should only take a few minutes. If you wish to migrate
> > > > at a specific time of day or date, please do let us know in the ticket,
> > > > otherwise we will migrate at the earliest convenient time.
> > > >
> > > > There will be redirects in place from git-wip to gitbox, so requests
> > > > using the old remote origins should still work (however we encourage
> > > > people to update their remotes once migration has completed).
> > > >
> > > > As always, we appreciate your understanding and patience as we move
> > > > things around and work to provide better services and features for
> > > > the Apache Family.
> > > >
> > > > Should you wish to contact us with feedback or questions, please do so
> > > > at: us...@infra.apache.org.
> > > >
> > > >
> > > > With regards,
> > > > Apache Infrastructure
> > > >
> > > >
> > >
>
>
>
> --
> Best Regards, Edward J. Yoon



-- 
Best Regards, Edward J. Yoon


Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!

2019-01-30 Thread Edward J. Yoon
Hi devs,
I propose we ask ASF Infra to move the Hama Git repo to GitBox as soon as
the release has been finalized / announced. Once they switch things over,
we can update the web site / documentation to reflect that.

Does anyone see any problems with this approach?
If there's no objections, I'll create a jira ticket.

Thanks.

On Thu, Jan 17, 2019 at 4:20 AM Chia-Hung Lin
 wrote:
>
> Hi Edward, thanks for help!
>
> On Tue, 15 Jan 2019 at 13:10, Edward J. Yoon  wrote:
>
> >  I can check tomorrow!
> >
> > 2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team <
> > infrastruct...@apache.org>님이 작성:
> >
> > > Hello, hama folks.
> > > As stated earlier in 2018, and reiterated two weeks ago, all git
> > > repositories must be migrated from the git-wip-us.apache.org URL to
> > > gitbox.apache.org, as the old service is being decommissioned. Your
> > > project is receiving this email because you still have repositories on
> > > git-wip-us that needs to be migrated.
> > >
> > > The following repositories on git-wip-us belong to your project:
> > >  - hama.git
> > >
> > >
> > > We are now entering the remaining three weeks of the mandated
> > > (coordinated) move stage of the roadmap, and you are asked to please
> > > coordinate migration with the Apache Infrastructure Team before February
> > > 7th. All repositories not migrated on February 7th will be mass migrated
> > > without warning, and we'd appreciate it if we could work together to
> > > avoid a big mess that day :-).
> > >
> > > As stated earlier, moving to gitbox means you will get full write access
> > > on GitHub as well, and be able to close/merge pull requests and much
> > > more. The move is mandatory for all Apache projects using git.
> > >
> > > To have your repositories moved, please follow these steps:
> > >
> > > - Ensure consensus on the move (a link to a lists.apache.org thread will
> > >   suffice for us as evidence).
> > > - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
> > >
> > > Your migration should only take a few minutes. If you wish to migrate
> > > at a specific time of day or date, please do let us know in the ticket,
> > > otherwise we will migrate at the earliest convenient time.
> > >
> > > There will be redirects in place from git-wip to gitbox, so requests
> > > using the old remote origins should still work (however we encourage
> > > people to update their remotes once migration has completed).
> > >
> > > As always, we appreciate your understanding and patience as we move
> > > things around and work to provide better services and features for
> > > the Apache Family.
> > >
> > > Should you wish to contact us with feedback or questions, please do so
> > > at: us...@infra.apache.org.
> > >
> > >
> > > With regards,
> > > Apache Infrastructure
> > >
> > >
> >



-- 
Best Regards, Edward J. Yoon


Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!

2019-01-15 Thread Edward J. Yoon
 I can check tomorrow!

2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team <
infrastruct...@apache.org>님이 작성:

> Hello, hama folks.
> As stated earlier in 2018, and reiterated two weeks ago, all git
> repositories must be migrated from the git-wip-us.apache.org URL to
> gitbox.apache.org, as the old service is being decommissioned. Your
> project is receiving this email because you still have repositories on
> git-wip-us that needs to be migrated.
>
> The following repositories on git-wip-us belong to your project:
>  - hama.git
>
>
> We are now entering the remaining three weeks of the mandated
> (coordinated) move stage of the roadmap, and you are asked to please
> coordinate migration with the Apache Infrastructure Team before February
> 7th. All repositories not migrated on February 7th will be mass migrated
> without warning, and we'd appreciate it if we could work together to
> avoid a big mess that day :-).
>
> As stated earlier, moving to gitbox means you will get full write access
> on GitHub as well, and be able to close/merge pull requests and much
> more. The move is mandatory for all Apache projects using git.
>
> To have your repositories moved, please follow these steps:
>
> - Ensure consensus on the move (a link to a lists.apache.org thread will
>   suffice for us as evidence).
> - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
>
> Your migration should only take a few minutes. If you wish to migrate
> at a specific time of day or date, please do let us know in the ticket,
> otherwise we will migrate at the earliest convenient time.
>
> There will be redirects in place from git-wip to gitbox, so requests
> using the old remote origins should still work (however we encourage
> people to update their remotes once migration has completed).
>
> As always, we appreciate your understanding and patience as we move
> things around and work to provide better services and features for
> the Apache Family.
>
> Should you wish to contact us with feedback or questions, please do so
> at: us...@infra.apache.org.
>
>
> With regards,
> Apache Infrastructure
>
>


Re: [jira] [Commented] (HAMA-1002) Add junit dependency to commons to compile with Hadoop 2.8+

2017-12-28 Thread Edward J. Yoon
Can someone review and commit this patch? :)

2017. 12. 26. 오후 5:04에 "ASF GitHub Bot (JIRA)" 님이 작성:

>
> [ https://issues.apache.org/jira/browse/HAMA-1002?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel=16303626#comment-16303626 ]
>
> ASF GitHub Bot commented on HAMA-1002:
> --
>
> GitHub user youngwookim opened a pull request:
>
> https://github.com/apache/hama/pull/17
>
> HAMA-1002: Add junit dependency to commons to compile with Hadoop 2.8+
>
>
>
> You can merge this pull request into a Git repository by running:
>
> $ git pull https://github.com/youngwookim/hama HAMA-1002
>
> Alternatively you can review and apply these changes as the patch at:
>
> https://github.com/apache/hama/pull/17.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
> This closes #17
>
> 
> commit fe57a39cc5a576d11d9137fbae156a79d4d7a8a1
> Author: Youngwoo Kim 
> Date:   2017-12-26T06:07:12Z
>
> HAMA-1002: Add junit dependency to commons to compile with Hadoop 2.8+
>
> 
>
>
> > Add junit dependency to commons to compile with Hadoop 2.8+
> > ---
> >
> > Key: HAMA-1002
> > URL: https://issues.apache.org/jira/browse/HAMA-1002
> > Project: Hama
> >  Issue Type: Bug
> >  Components: build
> >Affects Versions: 0.7.1
> >Reporter: YoungWoo Kim
> > Fix For: 0.7.2
> >
> > Attachments: HAMA-1002.0.patch
> >
> >
> > Compilation with Hadoop 2.8+ does not work because transitive
> dependencies for Hadoop have been changed:
> > {noformat}
> > $ mvn clean package -Phadoop2 -Dhadoop.version=2.8.1 -DskipTests
> > (snip)
> > [INFO] Apache Hama parent POM . SUCCESS [
> 23.797 s]
> > [INFO] pipes .. SUCCESS [
> 22.680 s]
> > [INFO] commons  FAILURE [
> 6.662 s]
> > [INFO] core ... SKIPPED
> > [INFO] graph .. SKIPPED
> > [INFO] machine learning ... SKIPPED
> > [INFO] examples ... SKIPPED
> > [INFO] mesos .. SKIPPED
> > [INFO] yarn ... SKIPPED
> > [INFO] hama-dist .. SKIPPED
> > [INFO] 
> 
> > [INFO] BUILD FAILURE
> > [INFO] 
> 
> > [INFO] Total time: 53.544 s
> > [INFO] Finished at: 2017-12-26T14:55:24+09:00
> > [INFO] Final Memory: 62M/568M
> > [INFO] 
> 
> > [ERROR] Failed to execute goal org.apache.maven.plugins:
> maven-compiler-plugin:2.3.2:testCompile (default-testCompile) on project
> hama-commons: Compilation failure: Compilation failure:
> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/
> apache/hama/commons/math/TestDenseDoubleVector.java:[20,23] error:
> package org.junit does not exist
> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/
> apache/hama/commons/math/TestDenseDoubleVector.java:[20,0] error: static
> import only from classes and interfaces
> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/
> apache/hama/commons/math/TestDenseDoubleVector.java:[21,23] error:
> package org.junit does not exist
> > (snip)
> > {noformat}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>


Re: Call for contributors for Apache Hama

2017-11-15 Thread Edward J. Yoon
Just FYI,

I met Min and others yesterday. We'll continue to contribute.

https://www.facebook.com/edwardj.yoon/posts/10213928621072592

On Thu, Nov 16, 2017 at 12:00 AM, Tommaso Teofili
<tommaso.teof...@gmail.com> wrote:
> Hi all,
>
> Apache Hama is looking for more contributors to help the project keep
> alive, while many of us in the current Apache Hama PMC (unfortunately) have
> not enough cycles to work on the project anymore.
> So this is a callout to the whole Apache Hama community to step up if any
> of you is willing to contribute.
>
> I'm cc-ing JongYoon Lim and ByungSeok Min who have already showed up as new
> contributors, thanks guys, we're looking forward to hear from you on the
> mailing lists (and patches welcome!).
>
> Regards,
> Tommaso



-- 
Best Regards, Edward J. Yoon


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2017-05-14 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009911#comment-16009911
 ] 

Edward J. Yoon commented on HAMA-983:
-

{code}
# create a new branch inside your directory 'current'
git checkout -b HAMA-983
# ... do some changes to the files ...
# store changes in the branch
git push origin HAMA-983
# commit changes to the branch
git commit -a -m '[HAMA-983] Hama runner for DataFlow'
Then go to your GitHub HAMA page and do a Pull Request. 
{code}

Hi JongYoon, you can create new branch like above.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>    Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2017-04-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984007#comment-15984007
 ] 

Edward J. Yoon commented on HAMA-983:
-

Sorry for late reply. 

{quote}could you create a branch called 'beam_support' on github?{quote} 

Sure. or, you'll also able to create a branch because you're committer. I can 
do it this weekend.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>    Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAMA-999) Wrong size of MemoryQueue

2017-04-10 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963871#comment-15963871
 ] 

Edward J. Yoon commented on HAMA-999:
-

Thanks for report! I'll check what's wrong.

> Wrong size of MemoryQueue
> -
>
> Key: HAMA-999
> URL: https://issues.apache.org/jira/browse/HAMA-999
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Reporter: JongYoon Lim
>
> I found that *SuperstepPiEstimator* example sometimes gives wrong result when 
> call *peer.getNumCurrentMessages()*. And that was because of wrong *size()* 
> of *MemoryQueue*. When I printed out sizes of queue from the example, 
> sometimes it said, 
> {noformat}
> bundle size: 20, numOfMsg: 19, deque size: 0
> {noformat}
> I think *deque*, *bundles* and *numOfMsg* of *MemoryQueue* should be properly 
> synchronized to get correct result. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


RE: Proposal for an Apache Hama sub-project

2017-02-27 Thread Edward J. Yoon
Thanks for your proposal.

I of course think Apache Hama can be used for scheduling sync and async
communication/computation networks with various topologies and resource
allocation. However, I'm not sure whether this approach is also fit for
modern microservice architecture? In my opinion, this can be discussed and
cooked in Hama community as a sub-project until it's mature enough (CC'ing
general@i.a.o. I'll be happy to read more feedbacks from ASF incubator
community).

P.S., It seems you referred to incubation proposal template. There's no need
to add me as initial committer (I don't have much time to actively
contribute to your project). And, I recently quit Samsung Electronics and
joined to $200 billion sized O2O e-commerce company as a CTO.

-Original Message-
From: Sachin Ghai [mailto:sachin.g...@impetus.co.in]
Sent: Monday, February 27, 2017 5:16 PM
To: dev@hama.apache.org
Subject: Proposal for an Apache Hama sub-project

Hama Community,

I would like to propose a sub-project for Apache Hama and initiate
discussion around the proposal. The proposed sub-project named 'Scalar' is a
scalable orchestration, training and serving system for machine learning and
deep learning. Scalar would leverage Apache Hama to automate the distributed
training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project
proposal template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data
analytics and deep learning modelling, deployment, serving with high
performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user
to easily scale the functions of training a model, deploying a model and
serving the prediction from underlying machine learning or deep learning
framework. It is also the characteristic of its execution framework to
orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache
Hadoop, Apache Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta
tested for one of the largest insurance organizations in a client specific
PoC. The motivation behind this work is to build a framework that provides
abstraction on heterogeneous data science frameworks and helps users
leverage them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in
the industry. As an application developer, it becomes a hard choice to
switch from one framework to another without rewriting the application.
Also, there is additional plumbing to be done to retrieve the prediction
results for each model in different frameworks. We aim to provide an
abstraction framework which can be used to seamlessly train and deploy the
model at scale on multiple frameworks like TensorFlow, Apache Horn or Caffe.
The abstraction further provides a unified layer for serving the prediction
in the most performant, scalable and efficient way for a multi-tenant
deployment. The key performance metrics will be reduction in training time,
lower error rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described
in terms of state, sequences and algorithms. The engine invokes execution
context of Apache Hama to train and deploy models on target framework.
Apache Hama is used for a variety of functions including parameter tuning
and scheduling computations on a distributed cluster. A data object layer
provides access to data from heterogeneous sources like HDFS, local, S3 etc.
A REST API layer is utilized for serving the prediction functions to client
applications. A caching layer in the middle acts as a latency improver for
various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning
training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the
algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on
meritocracy. We will provide continuous efforts to build an environment that
supports this, encouraging community members to contribute.
Community
A small community has formed within the Apache Hama project community and
companies such as enterprise services and product company and artificial
intelligence startup. There is a lot of interest in data science serving
systems and Artificial intelligence simplification systems. By bringing
Scalar into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni,
Nikunj Limbaseeya, Mayur Choubey
Known Risks
Orphaned Products
Apache Hama is already a core open source component being utilized at
Samsung

[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-12-07 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730744#comment-15730744
 ] 

Edward J. Yoon commented on HAMA-983:
-

cool, let me check.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>    Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-12-07 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730450#comment-15730450
 ] 

Edward J. Yoon commented on HAMA-983:
-

Here's my skeleton code with example that counts the words. You should 
implement the HamaPipelineRunner. Just translate and execute batch job. I think 
you can find how to translate them from flink's code: 
https://github.com/dataArtisans/flink-dataflow/blob/aad5d936abd41240f3e15d294ea181fb9cca05e0/runner/src/main/java/com/dataartisans/flink/dataflow/translation/FlinkBatchTransformTranslators.java#L410

{code}
public class WordCountTest {

  static final String[] WORDS_ARRAY = new String[] { "hi there", "hi",
  "hi sue bob", "hi sue", "", "bob hi" };

  static final List WORDS = Arrays.asList(WORDS_ARRAY);

  static final String[] COUNTS_ARRAY = new String[] { "hi: 5", "there: 1",
  "sue: 2", "bob: 2" };

  /**
   * Example test that tests a PTransform by using an in-memory input and
   * inspecting the output.
   */
  @Test
  @Category(RunnableOnService.class)
  public void testCountWords() throws Exception {
HamaOptions options = PipelineOptionsFactory.as(HamaOptions.class);
options.setRunner(HamaPipelineRunner.class);
Pipeline p = Pipeline.create(options);

PCollection input = p.apply(Create.of(WORDS).withCoder(
StringUtf8Coder.of()));

PCollection output = input
.apply(new WordCount())
.apply(MapElements.via(new FormatAsTextFn()));
//.apply(TextIO.Write.to("/tmp/result"));

PAssert.that(output).containsInAnyOrder(COUNTS_ARRAY);
p.run().waitUntilFinish();
  }

  public static class WordCount extends
  PTransform<PCollection, PCollection<KV<String, Long>>> {

private static final long serialVersionUID = 1L;

@Override
public PCollection<KV<String, Long>> apply(PCollection lines) {

  // Convert lines of text into individual words.
  PCollection words = lines.apply(ParDo.of(new DoFn<String, 
String>() {
private static final long serialVersionUID = 1L;
private final Aggregator<Long, Long> emptyLines =
createAggregator("emptyLines", new Sum.SumLongFn());

@ProcessElement
public void processElement(ProcessContext c) {
  if (c.element().trim().isEmpty()) {
emptyLines.addValue(1L);
  }

  // Split the line into words.
  String[] words = c.element().split("[^a-zA-Z']+");

  // Output each word encountered into the output PCollection.
  for (String word : words) {
if (!word.isEmpty()) {
  c.output(word);
}
  }
}
  }));

  // Count the number of times each word occurs.
  PCollection<KV<String, Long>> wordCounts = words.apply(Count
  . perElement());

  return wordCounts;
}
  }

  // / TODO
  public static class HamaPipelineRunner extends
  PipelineRunner {

public static HamaPipelineRunner fromOptions(PipelineOptions x) {
  return new HamaPipelineRunner();
}

@Override
public  Output apply(
PTransform<Input, Output> transform, Input input) {
return super.apply(transform, input);
}

@Override
public HamaPipelineResult run(Pipeline pipeline) {
  // TODO Auto-generated method stub
  System.out.println("Executing pipeline using HamaPipelineRunner.");

  // TODO you need to translate pipeline to Hama program
  // and execute pipeline
  // return the result
  return null;
}

  }

  public class HamaPipelineResult implements PipelineResult {

@Override
public State getState() {
  // TODO Auto-generated method stub
  return null;
}

@Override
public State cancel() throws IOException {
  // TODO Auto-generated method stub
  return null;
}

@Override
public State waitUntilFinish(Duration duration) {
  // TODO Auto-generated method stub
  return null;
}

@Override
public State waitUntilFinish() {
  // TODO Auto-generated method stub
  return null;
}

@Override
public  AggregatorValues getAggregatorValues(
Aggregator aggregator) throws AggregatorRetrievalException {
  // TODO Auto-generated method stub
  return null;
}

@Override
public MetricResults metrics() {
  // TODO Auto-generated method stub
  return null;
}

  }

  public static interface HamaOptions extends PipelineOptions {

  }

}
{code}

> Hama runner for DataFlow
> --------
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Pro

Dockerize Hama

2016-11-30 Thread Edward J. Yoon
Hi devs,

Does anyone interested in dockerize Hama cluster?

-- 
Best Regards, Edward J. Yoon


[jira] [Created] (HAMA-997) Docker-compose for Hama Cluster

2016-11-13 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-997:
---

 Summary: Docker-compose for Hama Cluster
 Key: HAMA-997
 URL: https://issues.apache.org/jira/browse/HAMA-997
 Project: Hama
  Issue Type: Task
  Components: build , documentation 
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


The current docker file doesn't work correctly. Each service e.g., master, 
groom servers should have own docker file and be launched using docker-compose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-09-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502033#comment-15502033
 ] 

Edward J. Yoon commented on HAMA-983:
-

>> once PoC is done

Great. If you need some helps, feel free to let me know :-)

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>    Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-09-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502017#comment-15502017
 ] 

Edward J. Yoon commented on HAMA-983:
-

Why don't we contribute this feature to the Apache Beam directly? 
https://github.com/apache/incubator-beam/tree/master/runners

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>    Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-09-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501973#comment-15501973
 ] 

Edward J. Yoon commented on HAMA-983:
-

https://cloud.google.com/dataflow/examples/wordcount-example

This page is well-described about beam concept. The flow is like below:

{code}
Creating the Pipeline
Applying transforms to the Pipeline
Reading input (in this example: reading text files)
Applying ParDo transforms
Applying SDK-provided transforms (in this example: Count)
Writing output (in this example: writing to Google Cloud Storage)
Running the Pipeline
{code}

Once we created Hama pipeline we should able to run the program like below:

{code}
  public static void main(String[] args) {
// Create a pipeline parameterized by commandline flags.
Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(arg));

p.apply(TextIO.Read.from("gs://..."))   // Read input.
 .apply(new CountWords())   // Do some processing.
 .apply(TextIO.Write.to("gs://..."));   // Write output.

// Run the pipeline.
p.run();
  }
{code}

For I/O operations, you can refer this 
https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/io/hadoop/HadoopIO.java
 (instead of org.apache.hadoop.mapreduce.lib.input.FileInputFormat you should 
use 
https://github.com/apache/hama/blob/master/core/src/main/java/org/apache/hama/bsp/FileInputFormat.java)

{quote}BSP for dataflow could be similar to SuperstepBSP{quote}

I think so. GroupByKey seems a built-in processor that groups records by key. 
We should implement it using a superstep.





> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: New Apache Hama committer: JongYoon Lim

2016-09-07 Thread Edward J. Yoon
Congratz and enjoy!

--
Best Regards, Edward J. Yoon


-Original Message-
From: JongYoon Lim [mailto:seedeng...@gmail.com]
Sent: Thursday, September 08, 2016 8:29 AM
To: dev@hama.apache.org
Subject: Re: New Apache Hama committer: JongYoon Lim

Thank you for warn welcome :)

Currently, I'm working on orchard robot project in Univ Of Auckland and
preparing for health care system.
And I try to use Hama as a core feature for analyzing health data.
I'm not an expert on parallel computing but have lots of interests.

It's an honor for me to join Hama project!

Best Regards,
JongYoon


2016-09-08 10:45 GMT+12:00 Behroz Sikander <behro...@gmail.com>:

> Welcome :)
>
> On Wed, Sep 7, 2016 at 10:06 PM, Anastasis Andronidis <
> andronat_...@hotmail.com> wrote:
>
> > Welcome JongYoon!
> >
> > Best,
> > Anastasios
> > Andronidis
> >
> > > On 7 Sep 2016, at 21:04, Tommaso Teofili <tommaso.teof...@gmail.com>
> > wrote:
> > >
> > > Welcome onboard JongYoon !
> > >
> > > Il giorno lun 5 set 2016 alle ore 00:54 Edward J. Yoon <
> > > edward.y...@samsung.com> ha scritto:
> > >
> > >> On behalf of the Apache Hama PMC, I am very pleased to announce that
> > >> JongYoon Lim has been elected to be committer on the Apache Hama
> project
> > >> recognizing his sustained contributions to the project.
> > >>
> > >> Welcome JongYoon Lim!
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >>
> > >>
> > >>
> > >>
> > >>
> >
> >
>




New Apache Hama committer: JongYoon Lim

2016-09-04 Thread Edward J. Yoon
On behalf of the Apache Hama PMC, I am very pleased to announce that
JongYoon Lim has been elected to be committer on the Apache Hama project
recognizing his sustained contributions to the project.

Welcome JongYoon Lim!

--
Best Regards, Edward J. Yoon






[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-08-31 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454239#comment-15454239
 ] 

Edward J. Yoon commented on HAMA-983:
-

Just FYI, Apache Beam's basic example is wordcount. I guess, the batch mode can 
be similar with org.apache.hama.examples.PiEstimator: (n - 1) tasks parses and 
counts the words and 1 task aggregates the word counts and emits the final 
result. The streaming mode is not sure, so you'll need to check how it handles 
io.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>    Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-08-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451100#comment-15451100
 ] 

Edward J. Yoon commented on HAMA-983:
-

Hi, I didn't look at dataflow (apache beam) closely, but:

>> Do you mean that each superstep can be executed in data pipeline as a 
>> pcollection? 

I guess yes, or single job can be executed as the case may be.

If you're interested in working on this, you can refer 
https://github.com/dataArtisans/flink-dataflow/blob/master/runner/src/main/java/com/dataartisans/flink/dataflow/FlinkPipelineRunner.java

And, before we do this, HAMA-940 and data processing BSP maybe the first I 
guess. Please feel free to drop your opinion and contribute the patches. :-)

If you have any questions, let me know.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-991) Add math classes for float16/float32

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438496#comment-15438496
 ] 

Edward J. Yoon commented on HAMA-991:
-

NOTE: float16 is not implemented yet.

> Add math classes for float16/float32
> 
>
> Key: HAMA-991
> URL: https://issues.apache.org/jira/browse/HAMA-991
> Project: Hama
>  Issue Type: New Feature
>  Components: math
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
>    Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> Implement Float32Writable, Vector, and Matrix etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-988) Allow to add additional no-input tasks as number user want

2016-08-25 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-988.
-
Resolution: Fixed

solved

> Allow to add additional no-input tasks as number user want
> --
>
> Key: HAMA-988
> URL: https://issues.apache.org/jira/browse/HAMA-988
> Project: Hama
>  Issue Type: Improvement
>  Components: bsp core
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
>    Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> BSP framework basically launches the tasks as number of splits. And, 
> force-setting the number of tasks is also possible by setting 
> "hama.force.set.bsp.tasks" to true .
> By the way, there's no way to add more specific tasks to the number of 
> splits. For example, if input has 5 splits, I want to launch 6 (1 more 
> no-input task to be acted as a master) tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-996) Delete meaningless parameter

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438170#comment-15438170
 ] 

Edward J. Yoon commented on HAMA-996:
-

I think TaskInProgress.getRecoveryTask() is related with task recovery. If ft 
service is enabled, the framework checkpoints statuses perioidically. When 
tasks failed or crashed, the framework recover the tasks from previous 
checkpoint automatically. It seems GroomServer.startRecoveryTask() and 
AsyncRcvdMsgCheckpointImpl.restartTask()'s role is for that.

If TaskInProgress.getRecoveryTask() is useless code, we can remove them or add 
tags @Deprecated with some comments.



> Delete meaningless parameter
> 
>
> Key: HAMA-996
> URL: https://issues.apache.org/jira/browse/HAMA-996
> Project: Hama
>  Issue Type: Improvement
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-996.patch
>
>
> It seems that *taskid* param from *getGroomToSchedule()* of *TaskInProgress* 
> is not essential for this function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-799) Add a new BSP API that uses multiple threads

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438151#comment-15438151
 ] 

Edward J. Yoon commented on HAMA-799:
-

Hi,

I originally thought that we can add something like 
https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html
 and the goal was supporting easy-to-use multithreading API within BSP. But we 
may different slightly.

In MapReduce case, map(K, V) function processes K, V of each line of the chunks 
of data sequentially (as you already might know). The multithreadedMap 
processes lines concurrently and generates intermediate files. 

The BSP model is more flexible. We can implement mapreduce framework on BSP 
model like below:

{code}
bsp(BSPPeer peer) {
 while (peer.readNext(key, value)) {
map(key, value); // calls user-defined map function.
}
... 
}
{code}

Then, the MultithreadedMapper is just like below:

{code}
bsp(BSPPeer peer) {
 while (peer.readNext(key, value)) {
executor.execute(new MultithreadedMapper(key, value)); // executes map 
function concurrently.
}
... 
}
{code}

After the while loop, above two approach will produce the same result but 
different performance.

The BSP model is slightly differenct. Each threads need to share the incoming 
and outgoing queues. Otherwise, it's just same with increasing the number of 
bsp tasks (this is meaningless). So, the multithreading should be used only for 
parallelization of some sequential computation part, not whole bsp() function. 
For example, 

{code}
bsp() {
   ...
   for(int i = 0; i < 1000; i++) {
  ... // this part can be multi-threaded.
   }
   ...
}
{code}

In GraphJobRunner, I used multithreading like below:

{code}
  private void doSuperstep(GraphJobMessage currentMessage,
  BSPPeer<Writable, Writable, Writable, Writable, GraphJobMessage> peer)
  throws IOException {
this.errorCount.set(0);
long startTime = System.currentTimeMillis();

this.changedVertexCnt = 0;
vertices.startSuperstep();

ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors
.newCachedThreadPool();
executor.setMaximumPoolSize(conf.getInt(DEFAULT_THREAD_POOL_SIZE, 64));
executor.setRejectedExecutionHandler(retryHandler);

long loopStartTime = System.currentTimeMillis();
while (currentMessage != null) {
  executor.execute(new ComputeRunnable(currentMessage));

  currentMessage = peer.getCurrentMessage();
}
LOG.info("Total time spent for superstep-" + peer.getSuperstepCount()
+ " looping: " + (System.currentTimeMillis() - loopStartTime) + " ms");

executor.shutdown();
try {
  executor.awaitTermination(60, TimeUnit.SECONDS);
} catch (InterruptedException e) {
  throw new IOException(e);
}

if (errorCount.get() > 0) {
  throw new IOException("there were " + errorCount
  + " exceptions during compute vertices.");
}

Iterator it = vertices.iterator();
while (it.hasNext()) {
  Vertex<V, E, M> vertex = (Vertex<V, E, M>) it.next();
  if (!vertex.isHalted() && !vertex.isComputed()) {
vertex.compute(Collections. emptyList());
vertices.finishVertexComputation(vertex);
  }
}

getAggregationRunner().sendAggregatorValues(peer,
vertices.getActiveVerticesNum(), this.changedVertexCnt);
this.iteration++;

LOG.info("Total time spent for superstep-" + peer.getSuperstepCount()
+ " computing vertices: " + (System.currentTimeMillis() - startTime)
+ " ms");

startTime = System.currentTimeMillis();
finishSuperstep();
LOG.info("Total time spent for superstep-" + peer.getSuperstepCount()
+ " synchronizing: " + (System.currentTimeMillis() - startTime) + " 
ms");
}
{code}

If there's more elegant way to use multithreading in bsp() function, we can do 
it. Otherwise, we should close this issue.

> Add a new BSP API that uses multiple threads
> 
>
> Key: HAMA-799
> URL: https://issues.apache.org/jira/browse/HAMA-799
>     Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
>
> Add a new (additional) BSP API that uses multiple threads, called 
> MultithreadedBSP. This could help in speeding up the highly CPU-intensive 
> task.
> And, I personally would like to re-design the GraphJobRunner based on this 
> MultithreadedBSP. Because computing vertex 1 at a time is a reason of slow 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-996) Delete meaningless parameter

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436419#comment-15436419
 ] 

Edward J. Yoon commented on HAMA-996:
-

It looks like getGroomToSchedule() method is useless.

> Delete meaningless parameter
> 
>
> Key: HAMA-996
> URL: https://issues.apache.org/jira/browse/HAMA-996
> Project: Hama
>  Issue Type: Improvement
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-996.patch
>
>
> It seems that *taskid* param from *getGroomToSchedule()* of *TaskInProgress* 
> is not essential for this function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-994) Support GPU for math operations

2016-08-24 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436238#comment-15436238
 ] 

Edward J. Yoon commented on HAMA-994:
-

Status: I checked license issue of aparapi but it's not suitable in Apache 
project - https://github.com/aparapi/aparapi/issues/37 If AMD's official reply 
is also same, I'll check another options.

> Support GPU for math operations
> ---
>
> Key: HAMA-994
> URL: https://issues.apache.org/jira/browse/HAMA-994
> Project: Hama
>  Issue Type: New Feature
>Affects Versions: 0.7.1
>        Reporter: Edward J. Yoon
>    Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> Support GPU for matrix/vector operations using aparapi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-994) Support GPU for math operations

2016-08-09 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-994:
---

 Summary: Support GPU for math operations
 Key: HAMA-994
 URL: https://issues.apache.org/jira/browse/HAMA-994
 Project: Hama
  Issue Type: New Feature
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


Support GPU for matrix/vector operations using aparapi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-993) HAMA Cluster is not running pi example

2016-07-29 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398974#comment-15398974
 ] 

Edward J. Yoon commented on HAMA-993:
-

Hi,

Can you please provide your error logs?

> HAMA Cluster is not running pi example
> --
>
> Key: HAMA-993
> URL: https://issues.apache.org/jira/browse/HAMA-993
> Project: Hama
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 0.7.1
>Reporter: Jatinder Goyal
>
> I have setup hama cluster of 9 nodes. I have used all the recommended 
> settings given on the site, but when I try to run pi example on hama it gets 
> stuck there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-06-21 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341557#comment-15341557
 ] 

Edward J. Yoon edited comment on HAMA-990 at 6/21/16 10:59 AM:
---

Generally looks good! As you already planned, it'd be nice if you can add more 
functions which dumps the output and plots 2d charts (gnu plot or google chart 
api?). Why don't you create a simple benchmark-tool project on github? That's 
more easy way to code reveiw and share. 



was (Author: udanax):
Generally looks good! As you already planned, it'd be nice if you can add more 
functions which dumps the output and plots 2d charts. Why don't you create a 
simple benchmark-tool project on github? That's more easy way to code reveiw 
and share. 


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-06-21 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341557#comment-15341557
 ] 

Edward J. Yoon commented on HAMA-990:
-

Generally looks good! As you already planned, it'd be nice if you can add more 
functions which dumps the output and plots 2d charts. Why don't you create a 
simple benchmark-tool project on github? That's more easy way to code reveiw 
and share. 


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-06-14 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331107#comment-15331107
 ] 

Edward J. Yoon commented on HAMA-990:
-

Sorry for late review, I'm on business trip until 21th :/ Until next week, it'd 
be nice if you can write some documentation and share test result w/ me.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Updated] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-06-09 Thread Edward J. Yoon
Great, I'll look at tomorrow!

On Wed, Jun 8, 2016 at 10:16 PM, Behroz Sikander (JIRA) <j...@apache.org> wrote:
>
>  [ 
> https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Behroz Sikander updated HAMA-990:
> -
> Attachment: ver1.1_benchmark_script.sh
>
> In the attachment you can find the updated version of script and it can now 
> work with Spark and Flink clusters.
>
> The script requires the following
> 1- MRQL_INSTALL_FOLDER -> The path where MRQL should be downloaded
> 2- HDFS_ADDRESS -> HDFS URL that clients use to communicate
> 3- SPARK_MASTER -> Spark master URL or if Yarn is used then yarn-client can 
> be passed
> 4- MRQL_NODES -> Total workers on which the PageRank algorithm needs to be 
> executed.
>
> It would be great if I can get a feedback on the script, so that I can 
> improve it. Further, currently, the output is shown on console directly. How 
> do you envision the final output of the script ?
>
>> GSoC'16: Apache Hama benchmark against Spark and Flink
>> --
>>
>> Key: HAMA-990
>> URL: https://issues.apache.org/jira/browse/HAMA-990
>> Project: Hama
>>  Issue Type: Documentation
>>Reporter: Behroz Sikander
>>Priority: Minor
>> Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh
>>
>>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)



-- 
Best Regards, Edward J. Yoon


[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311757#comment-15311757
 ] 

Edward J. Yoon commented on HAMA-992:
-

Hi, if possible please attach your bsp python code here.

And, I mean, you should update the BinaryProtocol.py and copy to hdfs again 
(instead of change the hama project).


> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> jav

[jira] [Issue Comment Deleted] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-992:

Comment: was deleted

(was: +1. Thanks for your opinion and action.)

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputS

[jira] [Issue Comment Deleted] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-992:

Comment: was deleted

(was: +1. Thanks for your opinion and action.)

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputS

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310141#comment-15310141
 ] 

Edward J. Yoon commented on HAMA-992:
-

+1. Thanks for your opinion and action.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputS

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310140#comment-15310140
 ] 

Edward J. Yoon commented on HAMA-992:
-

+1. Thanks for your opinion and action.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputS

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310142#comment-15310142
 ] 

Edward J. Yoon commented on HAMA-992:
-

+1. Thanks for your opinion and action.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputS

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310101#comment-15310101
 ] 

Edward J. Yoon commented on HAMA-992:
-

Thomas, I'm fine either way but adding to hama repo would be good idea if we 
can put some efforts to increase python supports and attract python users.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 

[jira] [Commented] (HAMA-991) Add math classes for float16/float32

2016-05-24 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299374#comment-15299374
 ] 

Edward J. Yoon commented on HAMA-991:
-

I'll push this 32bit float classes first. Thanks.

> Add math classes for float16/float32
> 
>
> Key: HAMA-991
> URL: https://issues.apache.org/jira/browse/HAMA-991
> Project: Hama
>  Issue Type: New Feature
>  Components: math
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
>    Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> Implement Float32Writable, Vector, and Matrix etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292552#comment-15292552
 ] 

Edward J. Yoon edited comment on HAMA-990 at 5/20/16 2:02 AM:
--

Yes.

We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project 
provides a shell script that auto-produce benchmark results. For example,
 {code}% ${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | 
others.. ]{code}

If MRQL is good for us and works well, we can leverage it.


was (Author: udanax):
Yes.

We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project 
provides a shell script that auto-produce benchmark results. For example, 
${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | others.. ]

If MRQL is good for us and works well, we can leverage it.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292552#comment-15292552
 ] 

Edward J. Yoon commented on HAMA-990:
-

Yes.

We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project 
provides a shell script that auto-produce benchmark results. For example, 
${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | others.. ]

If MRQL is good for us and works well, we can leverage it.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292530#comment-15292530
 ] 

Edward J. Yoon edited comment on HAMA-990 at 5/20/16 1:39 AM:
--

The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both 
Amazon and Google clouds

https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama
https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama

So, if we use MRQL, shell script (that generates some input data, schedules the 
jobs, and collects performance results) will be enough.


was (Author: udanax):
The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both 
Amazon and Google clouds

https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama
https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama

So, if we use MRQL, shell script will be enough.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292530#comment-15292530
 ] 

Edward J. Yoon commented on HAMA-990:
-

The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both 
Amazon and Google clouds

https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama
https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama

So, if we use MRQL, shell script will be enough.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292508#comment-15292508
 ] 

Edward J. Yoon edited comment on HAMA-990 at 5/20/16 1:23 AM:
--

{code}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page 
Rank and Query Processing whereas Spark is faster in Word Count. We can 
reproduce these results in our cluster and then can calculate the results for 
Hama. Once we have all the results we can compare all the systems.
{code}

I think good idea. With this, we may able to derive insight from the results 
(this should be our goal). I think I heard that flink uses own serialization 
techniques and shows good performance but unstable. Just FYI, MRQL also can be 
used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only 
few high-end machines equipped gpu and so somewhat not fit for large-scale 
distributed computing benchmark. If you can write some scripts that make it 
possible to auto-produce benchmark results on clouds such as Amazon or Google 
cloud, I can help.



was (Author: udanax):
{qoute}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page 
Rank and Query Processing whereas Spark is faster in Word Count. We can 
reproduce these results in our cluster and then can calculate the results for 
Hama. Once we have all the results we can compare all the systems.
{qoute}

I think good idea. With this, we may able to derive insight from the results 
(this should be our goal). I think I heard that flink uses own serialization 
techniques and shows good performance but unstable. Just FYI, MRQL also can be 
used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only 
few high-end machines equipped gpu and so somewhat not fit for large-scale 
distributed computing benchmark. If you can write some scripts that make it 
possible to auto-produce benchmark results on clouds such as Amazon or Google 
cloud, I can help.


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292508#comment-15292508
 ] 

Edward J. Yoon commented on HAMA-990:
-

{qoute}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page 
Rank and Query Processing whereas Spark is faster in Word Count. We can 
reproduce these results in our cluster and then can calculate the results for 
Hama. Once we have all the results we can compare all the systems.
{qoute}

I think good idea. With this, we may able to derive insight from the results 
(this should be our goal). I think I heard that flink uses own serialization 
techniques and shows good performance but unstable. Just FYI, MRQL also can be 
used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only 
few high-end machines equipped gpu and so somewhat not fit for large-scale 
distributed computing benchmark. If you can write some scripts that make it 
possible to auto-produce benchmark results on clouds such as Amazon or Google 
cloud, I can help.


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288809#comment-15288809
 ] 

Edward J. Yoon commented on HAMA-990:
-

how's your work going? and main goal? I personally recommend you don't spend 
much time for other trivial bug fixes.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285684#comment-15285684
 ] 

Edward J. Yoon commented on HAMA-941:
-

Quick comment from Greg Malewicz -- "There are many clustering algorithms. 
Perhaps it's better to start with why you need to group items, and then look at 
papers for an algorithm that has the desired grouping properties."

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285259#comment-15285259
 ] 

Edward J. Yoon commented on HAMA-941:
-

Sure, I'll check. greg a original author is also near my seat. :-)




-- 
Best Regards, Edward J. Yoon


> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>        Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-991) Add math classes for float16/float32

2016-05-10 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-991:
---

 Summary: Add math classes for float16/float32
 Key: HAMA-991
 URL: https://issues.apache.org/jira/browse/HAMA-991
 Project: Hama
  Issue Type: New Feature
  Components: math
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


Implement Float32Writable, Vector, and Matrix etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-06 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273903#comment-15273903
 ] 

Edward J. Yoon commented on HAMA-941:
-

Sorry for lazy review, it's Korean holidays and I'll be back next week. Can you 
please try to find the bug of implementation? :-)

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>        Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-940) Add StreamInputFormat

2016-05-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266013#comment-15266013
 ] 

Edward J. Yoon commented on HAMA-940:
-

If we can hide these implmentations and simplified APIs for processing stream 
data, I think this way is the better.

> Add StreamInputFormat
> -
>
> Key: HAMA-940
> URL: https://issues.apache.org/jira/browse/HAMA-940
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>        Reporter: Edward J. Yoon
>
> Add StreamInputFormat that reads newly appended records from previous 
> superstep. 
> I roughly guess it will be possible using reopen() method and file offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-940) Add StreamInputFormat

2016-05-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266012#comment-15266012
 ] 

Edward J. Yoon commented on HAMA-940:
-

As I mentioned in Description, we can simply check whether there's an newly 
appended records to the input file, keeping last read offset. 

To implement this, first of all, you should see the InputFormat interface 
class. The tricky issue is how we implement the getSplits() method and multiple 
tasks. 

At the moment, my simple idea is that one bsp task acts as a "Stream input 
queue" without implement StreamInputFormat and change the framework core. For 
example, we set the file path in job configuration. The master task acts like 
below: 

{code}
if(isMaster(peer.me)) {
  while(true) {
 peer.reopen(); // reopen
 peer.skip(offset); // jump to last offset
 if(peer.readNext()) {
 // at here we do load-balance.
sendTo("send a newly appended record to free slave tasks");
 } else {
Thread.sleep();
 }
  }
}
{code}



> Add StreamInputFormat
> -
>
> Key: HAMA-940
> URL: https://issues.apache.org/jira/browse/HAMA-940
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> Add StreamInputFormat that reads newly appended records from previous 
> superstep. 
> I roughly guess it will be possible using reopen() method and file offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265625#comment-15265625
 ] 

Edward J. Yoon commented on HAMA-941:
-

I just used \{code\} patch copied to clipboard \{code\} tag.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>        Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265533#comment-15265533
 ] 

Edward J. Yoon commented on HAMA-941:
-

P.S., Initial code can be found at HAMA-594. and, I changed few things because 
it doesn't work correctly.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>        Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265532#comment-15265532
 ] 

Edward J. Yoon commented on HAMA-941:
-

First of all, it looks like boundary score factor seems always 0.0. This is the 
user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. 
Please apply my patch and test again. Do you see more bugs? 

{code}
diff --git 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
index 9a905c1..38481fd 100644
--- 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
+++ 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
@@ -71,7 +71,7 @@
 candidates.add(msg);
 
 if (!msg.contains(this.getVertexID())
-&& msg.size() == semiClusterMaximumVertexCount) {
+&& msg.size() < semiClusterMaximumVertexCount) {
   SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf());
   msgNew.addVertex(this);
   msgNew.setSemiClusterId("C"
@@ -149,14 +149,15 @@
* @return the value to calcualte the Score of a semi-cluster.
*/
   public double semiClusterScoreCalcuation(SemiClusterMessage message) {
-double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0;
-int vC = 0, eC = 0;
+// TODO fB is the bounday score factor. This should be configurable by user
+// the default is 0.5
+double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0;
+int vC = 0;
 vC = message.size();
 for (Vertex<Text, DoubleWritable, SemiClusterMessage> v : message
 .getVertexList()) {
   List<Edge<Text, DoubleWritable>> eL = v.getEdges();
   for (Edge<Text, DoubleWritable> e : eL) {
-eC++;
 if (message.contains(e.getDestinationVertexID())
 && e.getValue() != null) {
   iC = iC + e.getValue().get();
@@ -165,8 +166,12 @@
 }
   }
 }
+
 if (vC > 1)
-  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC;
+  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2));
+else
+  sC = 1.0;
+
 return sC;
   }
{code}

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-989) Build fails on non-Linux systems

2016-04-28 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-989.
-
Resolution: Fixed
  Assignee: Behroz Sikander

Fixed.

> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
>Assignee: Behroz Sikander
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-989) Build fails on non-Linux systems

2016-04-27 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261553#comment-15261553
 ] 

Edward J. Yoon edited comment on HAMA-989 at 4/28/16 5:14 AM:
--

When you write commit log, you should follow the format: HAMA-989: your commit 
log
Then, apache infra and github will be integrated automatically by issue ID.

Also, you have to merge into 1 commit before pull request. You can use rebase 
command for example, git rebase -i HEAD~3.

Thanks.



was (Author: udanax):
When you write commit log, you should follow below format: HAMA-989: commitlog
Then, apache infra and github will be integrated automatically by issue ID.

Also, you have to merge into 1 commit before pull request. You can use rebase 
command for example, git rebase -i HEAD~3.

Thanks.


> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-989) Build fails on non-Linux systems

2016-04-27 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261553#comment-15261553
 ] 

Edward J. Yoon commented on HAMA-989:
-

When you write commit log, you should follow below format: HAMA-989: commitlog
Then, apache infra and github will be integrated automatically by issue ID.

Also, you have to merge into 1 commit before pull request. You can use rebase 
command for example, git rebase -i HEAD~3.

Thanks.


> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-989) Build fails on non-Linux systems

2016-04-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257343#comment-15257343
 ] 

Edward J. Yoon commented on HAMA-989:
-

We can catch and ignore exceptions or, SystemUtils.

{code}
diff --git 
a/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
 
b/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
index f4f89b9..b7bc9c8 100644
--- 
a/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
+++ 
b/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
@@ -23,6 +23,7 @@
 
 import junit.framework.TestCase;
 
+import org.apache.commons.lang.SystemUtils;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.NullWritable;
@@ -45,10 +46,14 @@
   public static volatile int increment = 1;
 
   public void testMemoryMessaging() throws Exception {
-HamaConfiguration conf = new HamaConfiguration();
-conf.setClass(MessageManager.RECEIVE_QUEUE_TYPE_CLASS, MemoryQueue.class,
-MessageQueue.class);
-messagingInternal(conf);
+if (SystemUtils.IS_OS_LINUX) {
+  HamaConfiguration conf = new HamaConfiguration();
+  conf.setClass(MessageManager.RECEIVE_QUEUE_TYPE_CLASS, MemoryQueue.class,
+  MessageQueue.class);
+  messagingInternal(conf);
+} else {
+  // we skip this test bc AsyncRPC is currently support only linux
+}
   }
 
   private static void messagingInternal(HamaConfiguration conf)
{code}

WDYT?

> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-989) Build fails on non-Linux systems

2016-04-25 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-989:

Summary: Build fails on non-Linux systems  (was: Build fails on non-Linux 
OS)

> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>    Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-989) Build fails on non-Linux OS

2016-04-25 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-989:
---

 Summary: Build fails on non-Linux OS
 Key: HAMA-989
 URL: https://issues.apache.org/jira/browse/HAMA-989
 Project: Hama
  Issue Type: Bug
  Components: bsp core, build 
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
 Fix For: 0.7.2


http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: io.netty.channel.epoll not supported on Mac ?

2016-04-25 Thread Edward J. Yoon
If excluding specific unit test by operating system condition is
difficult, I think we can do like
http://www.codeaffine.com/2013/11/18/a-junit-rule-to-conditionally-ignore-tests/


On Mon, Apr 25, 2016 at 4:43 PM, Behroz Sikander <behro...@gmail.com> wrote:
> Yea. It is blocking the build. What do you suggest ?
>
> Maybe we should add an exclude rule in pom.xml for the above testcases and
> fix the problem later ?
>
> Regards,
> Behroz
>
> On Mon, Apr 25, 2016 at 2:48 AM, Edward J. Yoon <edward.y...@samsung.com>
> wrote:
>
>> https://twitter.com/normanmaurer/status/724391936136646657
>>
>> Yes, that transport seems only works on Linux. If it blocks the build, I
>> think
>> we have to do something.
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>>
>> -Original Message-
>> From: Behroz Sikander [mailto:behro...@gmail.com]
>> Sent: Monday, April 25, 2016 8:29 AM
>> To: dev@hama.apache.org
>> Subject: io.netty.channel.epoll not supported on Mac ?
>>
>> Hi,
>> I was trying to configure Apache Hama's development environment on MAC
>> using the following command.
>> mvn clean install -Phadoop2 -Dhadoop.version=2.7.0
>>
>> The following testcases seems to fail
>> org.apache.hama.bsp.message.TestHamaAsyncMessageManager
>> org.apache.hama.ipc.TestAsyncRPC
>> org.apache.hama.ipc.TestAsyncIPC
>>
>> The errors are very similar and it seems that io.netty.channel.epoll is
>> only supported on Linux.
>>
>>
>>
>>
>> ---
>> Test set: org.apache.hama.bsp.message.TestHamaAsyncMessageManager
>>
>> ---
>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.029 sec
>> <<< FAILURE!
>>
>> testMemoryMessaging(org.apache.hama.bsp.message.TestHamaAsyncMessageManager)
>> Time elapsed: 0.007 sec  <<< ERROR!
>> java.lang.ExceptionInInitializerError
>> at io.netty.channel.epoll.EpollEventLoop.(EpollEventLoop.java:71)
>> at
>>
>> io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:71)
>> at
>>
>> io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
>> at
>>
>> io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
>> at
>>
>> io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:56)
>> at
>>
>> io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:48)
>> at
>>
>> io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:41)
>> at org.apache.hama.ipc.AsyncServer.(AsyncServer.java:84)
>> at org.apache.hama.ipc.AsyncRPC$NioServer.(AsyncRPC.java:722)
>> at org.apache.hama.ipc.AsyncRPC.getServer(AsyncRPC.java:676)
>> at org.apache.hama.ipc.AsyncRPC.getServer(AsyncRPC.java:653)
>> at
>>
>> org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.startServer(HamaAsyncMessageManagerImpl.java:97)
>> at
>>
>> org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.startRPCServer(HamaAsyncMessageManagerImpl.java:88)
>> at
>>
>> org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.init(HamaAsyncMessageManagerImpl.java:69)
>> at
>>
>> org.apache.hama.bsp.message.TestHamaAsyncMessageManager.messagingInternal(TestHamaAsyncMessageManager.java:72)
>> at
>>
>> org.apache.hama.bsp.message.TestHamaAsyncMessageManager.testMemoryMessaging(TestHamaAsyncMessageManager.java:51)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at junit.framework.TestCase.runTest(TestCase.java:168)
>> at junit.framework.TestCase.runBare(TestCase.java:134)
>> at junit.framework.TestResult$1.protect(TestResult.java:110)
>> at junit.framework.TestResult.runProtected(TestResult.java:128)
>> at junit.framework.TestResult.run(TestResult.java:113)
>> at junit.framework.TestCase.run(TestCase.java:124)
>> at junit.framework.TestSuite.runTest(TestSuite.java:232)
>> at junit.framework.TestSuite.run(TestSuite.java:227)
>> at
>>
>> org.junit.internal.runners.JUnit38

0.7.2-SNAPSHOT is now available on repository.apache.org

2016-04-24 Thread Edward J. Yoon
Just FYI, I've just deployed snapshot artifacts for testing HAMA-988 and
related things on
http://repository.apache.org/snapshots/org/apache/hama/hama-core/0.7.2-SNAPS
HOT/

--
Best Regards, Edward J. Yoon






RE: io.netty.channel.epoll not supported on Mac ?

2016-04-24 Thread Edward J. Yoon
https://twitter.com/normanmaurer/status/724391936136646657

Yes, that transport seems only works on Linux. If it blocks the build, I think 
we have to do something.

--
Best Regards, Edward J. Yoon


-Original Message-
From: Behroz Sikander [mailto:behro...@gmail.com]
Sent: Monday, April 25, 2016 8:29 AM
To: dev@hama.apache.org
Subject: io.netty.channel.epoll not supported on Mac ?

Hi,
I was trying to configure Apache Hama's development environment on MAC
using the following command.
mvn clean install -Phadoop2 -Dhadoop.version=2.7.0

The following testcases seems to fail
org.apache.hama.bsp.message.TestHamaAsyncMessageManager
org.apache.hama.ipc.TestAsyncRPC
org.apache.hama.ipc.TestAsyncIPC

The errors are very similar and it seems that io.netty.channel.epoll is
only supported on Linux.



---
Test set: org.apache.hama.bsp.message.TestHamaAsyncMessageManager
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.029 sec
<<< FAILURE!
testMemoryMessaging(org.apache.hama.bsp.message.TestHamaAsyncMessageManager)
Time elapsed: 0.007 sec  <<< ERROR!
java.lang.ExceptionInInitializerError
at io.netty.channel.epoll.EpollEventLoop.(EpollEventLoop.java:71)
at
io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:71)
at
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
at
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
at
io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:56)
at
io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:48)
at
io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:41)
at org.apache.hama.ipc.AsyncServer.(AsyncServer.java:84)
at org.apache.hama.ipc.AsyncRPC$NioServer.(AsyncRPC.java:722)
at org.apache.hama.ipc.AsyncRPC.getServer(AsyncRPC.java:676)
at org.apache.hama.ipc.AsyncRPC.getServer(AsyncRPC.java:653)
at
org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.startServer(HamaAsyncMessageManagerImpl.java:97)
at
org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.startRPCServer(HamaAsyncMessageManagerImpl.java:88)
at
org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.init(HamaAsyncMessageManagerImpl.java:69)
at
org.apache.hama.bsp.message.TestHamaAsyncMessageManager.messagingInternal(TestHamaAsyncMessageManager.java:72)
at
org.apache.hama.bsp.message.TestHamaAsyncMessageManager.testMemoryMessaging(TestHamaAsyncMessageManager.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:24)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
at org.junit.runner.JUnitCore.run(JUnitCore.java:127)
at
org.apache.maven.surefire.junitcore.JUnitCoreTestSet.runJunitCore(JUnitCoreTestSet.java:208)
at
org.apache.maven.surefire.junitcore.JUnitCoreTestSet.execute(JUnitCoreTestSet.java:95)
at
org.apache.maven.surefire.junitcore.JUnitCoreTestSet.execute(JUnitCoreTestSet.java:82)
at
org.apache.maven.surefire.junitcore.JUnitCoreDirectoryTestSuite.execute(JUnitCoreDirectoryTestSuite.java:84)
at org.apache.maven.surefire.Surefire.run(Surefire.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAc

[jira] [Created] (HAMA-988) Allow to add additional no-input tasks as number user want

2016-04-21 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-988:
---

 Summary: Allow to add additional no-input tasks as number user want
 Key: HAMA-988
 URL: https://issues.apache.org/jira/browse/HAMA-988
 Project: Hama
  Issue Type: Improvement
  Components: bsp core
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


BSP framework basically launches the tasks as number of splits. And, 
force-setting the number of tasks is also possible by setting 
"hama.force.set.bsp.tasks" to true .

By the way, there's no way to add more specific tasks to the number of splits. 
For example, if input has 5 splits, I want to launch 6 (1 more no-input task to 
be acted as a master) tasks. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: about the hama scheduler

2016-04-06 Thread Edward J. Yoon
http://people.apache.org/~tjungblut/downloads/hamadocs/

--
Best Regards, Edward J. Yoon


-Original Message-
From: hanif s [mailto:sad143...@yahoo.com.INVALID]
Sent: Thursday, April 07, 2016 11:48 AM
To: Edward J. Yoon; Dev
Subject: Re: about the hama scheduler

Good morning, Is there any site or information available about the 
internals of the Hama Framewrok?
Just like we have for the Hadoop as Hadoop internals: Hadoop Internals
|   |
|   |   |   |   |   |
| Hadoop InternalsThis project contains several diagrams describing Apache 
Hadoop internals (2.3.0 or later).  |
|  |
| View on ercoppa.github.io | Preview by Yahoo |
|  |
|   |


 Kind Regards,
무하마드 하니프
(Muhammad Hanif)

Computer and Software Engineering Department
Hanyang University, Seoul , South Korea

Cell Phone No: +82-10-29690410


  From: Edward J. Yoon <edward.y...@samsung.com>
 To: dev@hama.apache.org; 'hanif s' <sad143...@yahoo.com>
 Sent: Wednesday, 6 April 2016, 13:13
 Subject: RE: about the hama scheduler

Hi,

I'd recommend you research about network-aware task scheduling (network
locality). This will be very meaningful.

If you think about something like fair scheduler of Hadoop, I can say that you
have to focus more on microservice architecture e.g., dockerization, Hama on
mesos or YARN.

Thanks.

--
Best Regards, Edward J. Yoon

-Original Message-
From: hanif s [mailto:sad143...@yahoo.com.INVALID]
Sent: Wednesday, April 06, 2016 11:59 AM
To: hama-...@incubator.apache.org
Subject: about the hama scheduler

Hello Guys,I want to work on the improvement of the hama scheduling
mechanism. Please share the list of the research papers and other information
about the hama scheduler. Kind Regards,
무하마드 하니프
(Muhammad Hanif)

Computer and Software Engineering Department
Hanyang University, Seoul , South Korea

Cell Phone No: +82-10-29690410









RE: about the hama scheduler

2016-04-05 Thread Edward J. Yoon
Hi,

I'd recommend you research about network-aware task scheduling (network 
locality). This will be very meaningful.

If you think about something like fair scheduler of Hadoop, I can say that you 
have to focus more on microservice architecture e.g., dockerization, Hama on 
mesos or YARN.

Thanks.

--
Best Regards, Edward J. Yoon

-Original Message-
From: hanif s [mailto:sad143...@yahoo.com.INVALID]
Sent: Wednesday, April 06, 2016 11:59 AM
To: hama-...@incubator.apache.org
Subject: about the hama scheduler

Hello Guys, I want to work on the improvement of the hama scheduling 
mechanism. Please share the list of the research papers and other information 
about the hama scheduler. Kind Regards,
무하마드 하니프
(Muhammad Hanif)

Computer and Software Engineering Department
Hanyang University, Seoul , South Korea

Cell Phone No: +82-10-29690410




Re: About concurrency loading problem in GraphJobRunner.java in HamaV0.7.1

2016-04-01 Thread Edward J. Yoon
Hi,

Please subscribe mailing list first. Otherwise, your mail won't
delivered to subscribers.

On Thu, Mar 24, 2016 at 11:11 PM,
<dev-reject-1458828684.20517.ddaablpdbjjphejck...@hama.apache.org>
wrote:
>
> To approve:
>dev-accept-1458828684.20517.ddaablpdbjjphejck...@hama.apache.org
> To reject:
>dev-reject-1458828684.20517.ddaablpdbjjphejck...@hama.apache.org
> To give a reason to reject:
> %%% Start comment
> %%% End comment
>
>
>
> -- Forwarded message --
> From: "刘强" <edward2...@qq.com>
> To: dev <dev@hama.apache.org>
> Cc:
> Date: Thu, 24 Mar 2016 22:10:55 +0800
> Subject: About concurrency loading problem in GraphJobRunner.java in 
> HamaV0.7.1
>
> Dear developers,
>
> I've got problem:
> 16/03/24 20:30:53 INFO graph.GraphJobRunner: Total time spent for loading 
> vertices: 3447 ms
> 16/03/24 20:30:53 INFO graph.GraphJobRunner: Total time spent for 
> broadcasting global vertex count: 789 ms
> 16/03/24 20:30:54 INFO graph.GraphJobRunner: Total time spent for initial 
> superstep: 787 ms
> 16/03/24 20:30:55 INFO graph.GraphJobRunner: Total time spent for 
> broadcasting aggregation values: 108 ms
> 16/03/24 20:30:55 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.lang.NullPointerException
> at 
> org.apache.hama.util.UnsafeByteArrayInputStream.(UnsafeByteArrayInputStream.java:63)
> at org.apache.hama.util.WritableUtils.unsafeDeserialize(WritableUtils.java:63)
> at org.apache.hama.graph.MapVerticesInfo.get(MapVerticesInfo.java:101)
> at 
> org.apache.hama.graph.GraphJobRunner$ComputeRunnable.(GraphJobRunner.java:367)
> at org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:277)
> at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:187)
> at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:171)
> at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
>
> I add some vertices print information both in loading from local vertices and 
> transferring vertices in partition phase (just print the vertex information. 
> vertex.toString()... ). Then its OK.
>
> I found that the above exception was because that while processing messages 
> in superstep, some vertices missed can cannot get from ConcurrentHashMap you 
> used in the MapVerticesInfo.java.
>
> This may be because that the loadVertices() funtion has some concurrency 
> problem. Too many threads are started and sometimes that the loaded funtion 
> may miss some vertices and I doubt its the problem of ConcurrencyHashMap.
>
> Can you tell me the reason why its caused? For I'm not quite sure.
>
> Thanks very much!
> Yours , QiangLiu !
>
>
>
>
>
>



-- 
Best Regards, Edward J. Yoon


RE: HamaV0.7.1 bug Report

2016-03-23 Thread Edward J. Yoon
Hi,

Hama 0.7.1 and default PageRank example works fine with me. Can you provide
more details about your test?


--
Best Regards, Edward J. Yoon

-Original Message-
From: 증퓻 [mailto:edward2...@qq.com]
Sent: Wednesday, March 23, 2016 9:25 PM
To: dev
Subject: HamaV0.7.1 bug Report

Got strange problem during the steup funtion for GraphJobRunner.java.


16/03/23 20:22:11 INFO graph.GraphJobRunner: 63083 vertices are loaded into
c02b06:61002
16/03/23 20:22:11 INFO graph.GraphJobRunner: Total time spent for loading
vertices: 4490 ms
16/03/23 20:22:11 INFO graph.GraphJobRunner: Total time spent for
broadcasting global vertex count: 208 ms
16/03/23 20:22:12 INFO graph.GraphJobRunner: Total time spent for initial
superstep: 734 ms
16/03/23 20:22:12 INFO graph.GraphJobRunner: Total time spent for
broadcasting aggregation values: 109 ms
16/03/23 20:22:13 ERROR bsp.BSPTask: Error running bsp setup and bsp
function.
java.lang.NullPointerException
at
org.apache.hama.util.UnsafeByteArrayInputStream.(UnsafeByteArrayInputS
tream.java:63)
at
org.apache.hama.util.WritableUtils.unsafeDeserialize(WritableUtils.java:63)
at
org.apache.hama.graph.MapVerticesInfo.get(MapVerticesInfo.java:101)
at
org.apache.hama.graph.GraphJobRunner$ComputeRunnable.(GraphJobRunner.j
ava:362)
at
org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:273)
at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:185)
at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:171)
at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)


Running Pagerank case .
serialization running is good, but breaks down with parallel.




[jira] [Resolved] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-03-21 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-984.
-
Resolution: Fixed

I just committed this, Thanks Cazen!

Please use the TRUNK version on your environments, and feel free to report your 
problems. Thanks.

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>Assignee: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-984:

Assignee: Cazen Lee

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>Assignee: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-986) Hashcode calculation

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-986:

Fix Version/s: 0.7.2

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.1
>Reporter: JongYoon Lim
>Priority: Trivial
> Fix For: 0.7.2
>
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-986) Hashcode calculation

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-986.
-
Resolution: Fixed
  Assignee: JongYoon Lim

I just committed this! Thanks JongYoon.

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.1
>Reporter: JongYoon Lim
>Assignee: JongYoon Lim
>Priority: Trivial
> Fix For: 0.7.2
>
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-986) Hashcode calculation

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-986:

Affects Version/s: 0.7.1

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.1
>Reporter: JongYoon Lim
>Priority: Trivial
> Fix For: 0.7.2
>
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-982.
-
   Resolution: Fixed
Fix Version/s: (was: 0.7.2)
   0.7.1

Fixed.

> Vertex.read/writeState() method throws NullPointerException
> ---
>
> Key: HAMA-982
> URL: https://issues.apache.org/jira/browse/HAMA-982
> Project: Hama
>  Issue Type: Bug
>  Components: graph
>Affects Versions: 0.7.0
>    Reporter: Edward J. Yoon
>    Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> It occurs at partitioning and initial supersteps.
> >  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-982:

Fix Version/s: (was: 0.7.1)
   0.7.2

> Vertex.read/writeState() method throws NullPointerException
> ---
>
> Key: HAMA-982
> URL: https://issues.apache.org/jira/browse/HAMA-982
> Project: Hama
>  Issue Type: Bug
>  Components: graph
>Affects Versions: 0.7.0
>    Reporter: Edward J. Yoon
>    Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> It occurs at partitioning and initial supersteps.
> >  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-986) Hashcode calculation

2016-03-13 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192751#comment-15192751
 ] 

Edward J. Yoon commented on HAMA-986:
-

Thanks for your contribution! Since we're currently in release process, I can 
commit few days later. 

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [RESULT][VOTE] Apache Hama 0.7.1 release (RC2)

2016-03-12 Thread Edward J. Yoon
There was small problem to publish artifacts to the maven repository.
So I've changed pom.xml build file slightly (added
maven-scm-provider-gitexe dependency). I'll continue to release
process because there's no code changes.

Thanks!

On Thu, Mar 3, 2016 at 6:50 PM, Edward J. Yoon <edward.y...@samsung.com> wrote:
> Thanks, this vote passes with 4 binding +1s, 1 non-binding +1.
>
> --
> Best Regards, Edward J. Yoon
>
> -Original Message-
> From: Martin Illecker [mailto:millec...@apache.org]
> Sent: Wednesday, March 02, 2016 8:46 PM
> To: dev@hama.apache.org
> Subject: Re: [VOTE] Apache Hama 0.7.1 release (RC2)
>
> +1
>
> 2016-03-02 2:42 GMT+01:00 Behroz Sikander <behro...@gmail.com>:
>
>> +1
>>
>> On Wed, Mar 2, 2016 at 2:36 AM, Dongjin Lee <dongjin.lee...@gmail.com>
>> wrote:
>>
>> > +1.
>> > Thanks,Dongjin
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Mar 1, 2016 at 3:53 PM -0800, "Anastasis Andronidis" <
>> > andronat_...@hotmail.com> wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > +1
>> >
>> > Cheers,
>> > Anastasios
>> >
>> > > On 29 Feb, 2016, at 14:05, Edward J. Yoon  wrote:
>> > >
>> > > Hi guys,
>> > >
>> > > This is a reminder that, if you are an PMC member, please vote for the
>> > > new Hama release.
>> > >
>> > > Thanks!
>> > >
>> > > On Sat, Feb 27, 2016 at 10:36 PM, Edward J. Yoon  wrote:
>> > >> I'm +1, this works well on my cluster.
>> > >>
>> > >> On Mon, Feb 15, 2016 at 8:35 AM, Edward J. Yoon  wrote:
>> > >>> Hi all,
>> > >>>
>> > >>> I just created a 2nd release candidate for Apache Hama 0.7.1 release.
>> > This
>> > >>> RC fixes newly reported bug of graph module. It compiled Java7.
>> > >>>
>> > >>> RC2 is available at:
>> > >>> http://people.apache.org/~edwardyoon/dist/0.7.1-RC2/
>> > >>>
>> > >>> Tags:
>> > >>> https://github.com/apache/hama/tree/0.7.1-RC2
>> > >>>
>> > >>> Please try it on your environment, run the tests, verify checksum
>> > files,
>> > >>> etc. and vote.
>> > >>>
>> > >>> Thanks~
>> > >>>
>> > >>> --
>> > >>> Best Regards, Edward J. Yoon
>> > >>>
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Best Regards, Edward J. Yoon
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Edward J. Yoon
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>



-- 
Best Regards, Edward J. Yoon


[jira] [Created] (HAMA-985) Update git scm provider dependency

2016-03-06 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-985:
---

 Summary: Update git scm provider dependency
 Key: HAMA-985
 URL: https://issues.apache.org/jira/browse/HAMA-985
 Project: Hama
  Issue Type: Bug
  Components: build 
Reporter: Edward J. Yoon


Symptom: mvn release:prepare or perform not committing changes to pom.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-03-02 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177081#comment-15177081
 ] 

Edward J. Yoon commented on HAMA-984:
-

Hi, the hadoop-auth package is used for security/auth in o.a.h.ipc package and 
YARN module.

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Fault Tolerance in Hama

2016-02-29 Thread Edward J. Yoon
Internally, the framework checkpoint the messages transferred among
BSP tasks during the BSP synchronization period.

If user want to checkpoint additional other things, user should use
HDFS APIs directly.

On Mon, Feb 29, 2016 at 11:15 PM, Behroz Sikander <behro...@gmail.com> wrote:
> Ok. So, Hama does support FT but it is not thoroughly tested.
>
> Btw, how can a user checkpoint or Hama does that internally ? Is there any
> method exposed using BSPPeer ?
>
> Regards,
> Behroz
>
> On Mon, Feb 29, 2016 at 2:03 PM, Edward J. Yoon <edwardy...@apache.org>
> wrote:
>
>> If I remember correctly, .. the framework change the job status as a
>> "recovering" first, and then simply restart all the tasks from the
>> last checkpoint. It works well but I only tested simple jobs (no
>> input/output) on my cluster (see also HAMA-973).
>>
>> To write perfect FT application from user side, every states in BSP
>> program need to be written on the disk. So, some people discussed and
>> introduced new Superstep API that provides more abstract interface
>> like Pregel.
>>
>>
>> On Mon, Feb 29, 2016 at 8:09 PM, Behroz Sikander <behro...@gmail.com>
>> wrote:
>> > Hi,
>> > Just a quick question, is Hama fault tolerant ? What happens if a Hama
>> > tasks fails ?
>> >
>> > Regards,
>> > Behroz
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>



-- 
Best Regards, Edward J. Yoon


Re: [VOTE] Apache Hama 0.7.1 release (RC2)

2016-02-29 Thread Edward J. Yoon
Hi guys,

This is a reminder that, if you are an PMC member, please vote for the
new Hama release.

Thanks!

On Sat, Feb 27, 2016 at 10:36 PM, Edward J. Yoon <edwardy...@apache.org> wrote:
> I'm +1, this works well on my cluster.
>
> On Mon, Feb 15, 2016 at 8:35 AM, Edward J. Yoon <edward.y...@samsung.com> 
> wrote:
>> Hi all,
>>
>> I just created a 2nd release candidate for Apache Hama 0.7.1 release. This
>> RC fixes newly reported bug of graph module. It compiled Java7.
>>
>> RC2 is available at:
>> http://people.apache.org/~edwardyoon/dist/0.7.1-RC2/
>>
>> Tags:
>> https://github.com/apache/hama/tree/0.7.1-RC2
>>
>> Please try it on your environment, run the tests, verify checksum files,
>> etc. and vote.
>>
>> Thanks~
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon



-- 
Best Regards, Edward J. Yoon


Re: Fault Tolerance in Hama

2016-02-29 Thread Edward J. Yoon
If I remember correctly, .. the framework change the job status as a
"recovering" first, and then simply restart all the tasks from the
last checkpoint. It works well but I only tested simple jobs (no
input/output) on my cluster (see also HAMA-973).

To write perfect FT application from user side, every states in BSP
program need to be written on the disk. So, some people discussed and
introduced new Superstep API that provides more abstract interface
like Pregel.


On Mon, Feb 29, 2016 at 8:09 PM, Behroz Sikander <behro...@gmail.com> wrote:
> Hi,
> Just a quick question, is Hama fault tolerant ? What happens if a Hama
> tasks fails ?
>
> Regards,
> Behroz



-- 
Best Regards, Edward J. Yoon


[jira] [Commented] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-02-27 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170592#comment-15170592
 ] 

Edward J. Yoon commented on HAMA-984:
-

Thanks for pull request. I can check next week.

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hama 0.7 release (RC2)

2016-02-27 Thread Edward J. Yoon
I'm +1, this works well on my cluster.

On Mon, Feb 15, 2016 at 8:35 AM, Edward J. Yoon <edward.y...@samsung.com> wrote:
> Hi all,
>
> I just created a 2nd release candidate for Apache Hama 0.7.1 release. This
> RC fixes newly reported bug of graph module. It compiled Java7.
>
> RC2 is available at:
> http://people.apache.org/~edwardyoon/dist/0.7.1-RC2/
>
> Tags:
> https://github.com/apache/hama/tree/0.7.1-RC2
>
> Please try it on your environment, run the tests, verify checksum files,
> etc. and vote.
>
> Thanks~
>
> --
> Best Regards, Edward J. Yoon
>
>
>



-- 
Best Regards, Edward J. Yoon


Re: Does Hama supportting AWS S3 schema?

2016-02-25 Thread Edward J. Yoon
Yes, it sounds great. To support S3 as Input or Output for Hama BSP
jobs, I guess you'll need to implement our own S3 in/out formatters.

The rests e.g., hadoop-version, auth, uri, etc. must be the
configuration or build option issue.

Please feel free to open JIRA ticket, and enjoy to contribute your patches!

Thanks.


On Thu, Feb 25, 2016 at 5:22 PM, Cazen Lee <cazen@gmail.com> wrote:
> Additional information:
> NativeS3Filesystem is not under default hadoop 2.6 and upper's classpath
> I think it will works again with hadoop-aws in pom
>
> --
> cazen@gmail.com
> cazen@samsung.com
> http://www.cazen.co.kr
>
> Good day this is Cazen
>
> Is there are any plan to support S3?
>
> I think it will be very helpful if we apply it because S3 is most famous
> and easy to use.
>
> If sounds good, could I create Jira issue for futher discussion?
>
> Any advice is welcome.
>
> --
> cazen@gmail.com
> cazen....@samsung.com
> http://www.cazen.co.kr



-- 
Best Regards, Edward J. Yoon


RE: About DiskQueue in Hama

2016-02-24 Thread Edward J. Yoon
Yes, that feature is removed because of it doesn't help much to improve both
performance and memory consumption. You can refer old discussion thread:
http://markmail.org/message/wld3j7dp67wrl5vk

We'll maybe need to implement it in efficient way again.

--
Best Regards, Edward J. Yoon


-Original Message-
From: 증퓻 [mailto:edward2...@qq.com]
Sent: Wednesday, February 24, 2016 3:43 PM
To: dev; commits
Subject: About DiskQueue in Hama

Dear administrators,
 I'm writing to trouble you about some questions in Hama 0.7 version . I
have found the DiskQueue implementation in Hama 0.6.4 version, but it seems
disappeared in the 0.7 version.  Does this means that the disk
implementation in Hama is canceled ?
Thank you very much!

Yours, Qiangliu




[jira] [Updated] (HAMA-983) Hama runner for DataFlow

2016-02-16 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-983:

Labels: gsoc2016  (was: )

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>    Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-983) Hama runner for DataFlow

2016-02-14 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-983:
---

 Summary: Hama runner for DataFlow
 Key: HAMA-983
 URL: https://issues.apache.org/jira/browse/HAMA-983
 Project: Hama
  Issue Type: Bug
Reporter: Edward J. Yoon


As you already know, Apache Beam provides unified programming model for both 
batch and streaming inputs.

The APIs are generally associated with data filtering and transforming. So 
we'll need to implement some data processing runner like 
https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java

Also, implementing similarity join can be funny. According to 
http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
clearly winner among Apache Hadoop and Apache Spark.

Since it consists of transformation, aggregation, and partition computations, I 
think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] Apache Hama 0.7 release (RC2)

2016-02-14 Thread Edward J. Yoon
Hi all,

I just created a 2nd release candidate for Apache Hama 0.7.1 release. This
RC fixes newly reported bug of graph module. It compiled Java7.

RC2 is available at:
http://people.apache.org/~edwardyoon/dist/0.7.1-RC2/

Tags:
https://github.com/apache/hama/tree/0.7.1-RC2

Please try it on your environment, run the tests, verify checksum files,
etc. and vote.

Thanks~

--
Best Regards, Edward J. Yoon





Re: [ANNOUNCE] Behroz Sikander as new commiter and PMC member

2016-02-02 Thread Edward J. Yoon
By the way, @Anastasis,

Please subscribe private list by sending mail to
private-subscr...@hama.apache.org if you're still not in that list.
You're the one of PMC member.

On Thu, Jan 28, 2016 at 9:15 AM, Anastasis Andronidis
<andronat_...@hotmail.com> wrote:
> Congratulations Behroz!
>
> Cheers,
> Anastasios
> Andronidis
>
>> On 27 Jan, 2016, at 22:36, Edward J. Yoon <edward.y...@samsung.com> wrote:
>>
>> The Project Management Committee (PMC) for Apache Hama has asked Behroz
>> Sikander to become a committer and PMC member, and we are pleased to
>> announce that he has accepted.
>>
>> Please join me in congratulating Behroz! :)
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>>
>>
>



-- 
Best Regards, Edward J. Yoon


Re: RE: RE: RE: RE: Do Hama support member member variable?

2016-01-28 Thread Edward J. Yoon
Wow, you find the bug. Thanks ;)

>  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)

When the framework assign vertices to the proper machine at initial
phase, vertex objects are transferred in serialized form. At this step
 user defined code won't work correctly. I'll fix soon.

Anyway, you should able to manage an array of TextPair objects like below:

private TextPair[] test = new TextPair[1];

public void readState(DataInput in) throws IOException {
  int size = in.readInt();
  test = new TextPair[size];
  for(int i = 0; i < size; i++) {
test[i].readFields(in);
  }
}

public void writeState(DataOutput out) throws IOException {
  out.writeInt(test.length);
  for(int i = 0; i < test.length; i++) {
test[i].write(out);
  }
}

Thanks.


On Thu, Jan 28, 2016 at 6:45 PM, 步青云 <mailliup...@qq.com> wrote:
> Thanks for your reply. You do help me a lot.
> I have tried to use two methods. Some problem are still bothering me.
> When I use the first method of using Hadoop built-in writable classes, I get 
> a NullPointerException meaning the parents is null, even though I have 
> initialize parents. The code is like this.
> static ArrayWritable parents= new ArrayWritable(TextPair.class);
>
>
> public void writeState(DataOutput out) throws IOException {
> out.writeBoolean(match);
> parents.write(out);
> }
>
>
> public void readState(DataInput in) throws IOException {
>match = in.readBoolean();
>parents.readFields(in);
> }
>
>
>
> And the error message is as follow:
> Exception in thread "pool-6-thread-2" java.lang.RuntimeException: 
> java.lang.NullPointerException
> at 
> org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:562)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.io.ArrayWritable.write(ArrayWritable.java:103)
> at ProbMatch$ProbMatchVertex.writeState(ProbMatch.java:154)
> at org.apache.hama.graph.Vertex.write(Vertex.java:311)
> at 
> org.apache.hama.util.WritableUtils.unsafeSerialize(WritableUtils.java:55)
> at org.apache.hama.graph.MapVerticesInfo.put(MapVerticesInfo.java:64)
> at 
> org.apache.hama.graph.GraphJobRunner.addVertex(GraphJobRunner.java:577)
> at 
> org.apache.hama.graph.GraphJobRunner.access$300(GraphJobRunner.java:64)
> at 
> org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)
>
>
>
> When I'm trying use the second method, I don't know how to read and write an 
> object array. I can use out.writeInt() method to write int. But when I need 
> to write an object. How can I do this? I'm so sorry that I'm not good at 
> Java. Here is the some code I'm trying to write.
>
>
> static TextPair[] arr = new TextPair[1];
>
>
> public void writeState(DataOutput out) throws IOException {
> out.writeBoolean(match);
> out.writeInt(arr.length);
> for(int i = 0; i < arr.length; i++) {
>   out.writeInt(arr[i].toString());  // Is this right?
> }}
>
>
> public void readState(DataInput in) throws IOException {
>match = in.readBoolean();
>int length = in.readInt();
>for(int i=0;i<length;i++){
>   arr[i] = in.read// How can I read the textpair here?
>}
> }
>
>
>
> I'm very grateful to your help. Thanks again.
> Best Regards, Ping Liu.
>
>
>
>
> -- Original --
> From:  "Edward J. Yoon";<edward.y...@samsung.com>;
> Date:  Thu, Jan 28, 2016 11:51 AM
> To:  "user"<u...@hama.apache.org>;
>
> Subject:  RE: RE: RE: RE: Do Hama support member member variable?
>
>
>
> Hi,
>
> You can use Hadoop built-in writable classes or own custom Writable.
>
> static ArrayWritable arr = new ArrayWritable(DoubleWritable.class);
>
>   public void writeState(DataOutput out) throws IOException {
> arr.write(out);
>   }
>
> Or,
>
> static int[] arr2 = new int[3];
>
>   public void writeState(DataOutput out) throws IOException {
> out.writeInt(arr2.length);
> for(int i = 0; i < arr2.length; i++) {
>   out.writeInt(arr2[i]);
> }
>   }
>
>   public void readState(DataInput in) throws IOException {
> int size = in.readInt();
> for(int i = 0; i < size; i++) {
>   arr2[i] = in.readInt();
> }
>   }
>
> --
> Best Regards, Edward J. Yoon
>

[jira] [Updated] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-01-28 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-982:

Description: 
It occurs at partitioning and initial supersteps.

>  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)

  was:
It occurs when partitioning and initial supersteps.

>  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)


> Vertex.read/writeState() method throws NullPointerException
> ---
>
> Key: HAMA-982
> URL: https://issues.apache.org/jira/browse/HAMA-982
> Project: Hama
>  Issue Type: Bug
>  Components: graph
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> It occurs at partitioning and initial supersteps.
> >  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-01-28 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-982:
---

 Summary: Vertex.read/writeState() method throws 
NullPointerException
 Key: HAMA-982
 URL: https://issues.apache.org/jira/browse/HAMA-982
 Project: Hama
  Issue Type: Bug
  Components: graph
Affects Versions: 0.7.0
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.1


It occurs when partitioning and initial supersteps.

>  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[CANCEL][VOTE] Apache Hama 0.7.1 release (RC1)

2016-01-28 Thread Edward J. Yoon
New bug has found.. I'll rollback and re-build again after fix reported bug.

On Mon, Jan 25, 2016 at 8:55 AM, Edward J. Yoon <edward.y...@samsung.com> wrote:
> Hi all,
>
> I just created a first release candidate for the Apache Hama 0.7.1 release.
> This release fixes Hama streaming and YARN bugs, and includes new feature
> round-robin task scheduling.
>
> The RC1 is available at:
> http://people.apache.org/~edwardyoon/dist/0.7.1-RC1/
>
> Tags:
> https://github.com/apache/hama/tree/0.7.1-RC1
>
> Please try it, run the tests, verify checksum files, etc. and vote.
>
> Thanks!
>
> --
> Best Regards, Edward J. Yoon
>
>
>



-- 
Best Regards, Edward J. Yoon


[VOTE] Apache Hama 0.7.1 release (RC1)

2016-01-24 Thread Edward J. Yoon
Hi all,

I just created a first release candidate for the Apache Hama 0.7.1 release.
This release fixes Hama streaming and YARN bugs, and includes new feature
round-robin task scheduling.

The RC1 is available at:
http://people.apache.org/~edwardyoon/dist/0.7.1-RC1/

Tags:
https://github.com/apache/hama/tree/0.7.1-RC1

Please try it, run the tests, verify checksum files, etc. and vote.

Thanks!

--
Best Regards, Edward J. Yoon





Re: Question regarding Hama synchronization behavior and GSOC

2016-01-19 Thread Edward J. Yoon
One idea is BSP-based decision tree classification project.

> it seems that if the outgoing queue is large on slaves then they will take
more time.

The asynchronous message sending mechanism can reduce that time. I
think this also can be a GSoC project. :-)



On Tue, Jan 19, 2016 at 5:24 PM, Behroz Sikander <behro...@gmail.com> wrote:
> Hi,
>
> *> Q1: Is Hama going to participate in GSOC 2016 ? *
> *Sure, why not?*
>
> -->Great. I am willing to participate in this GSOC. Do we already have some
> potential projects ? Jira does not seem to have any.
>
>
>
>
>
>
>
>
>
>
> *>> Q2: In the image below, I see an interesting behavior of Hama but I am
> not sure why the behavior is like this. Can you tell us what version you
> used? I roughly guess master task can receive incoming message bundles
> concurrently if number of tasks is large.*
> --> I am using 0.7.0.
> Ok but can a slave send concurrent message to master if the queue is
> large ? because
> it seems that if the outgoing queue is large on slaves then they will take
> more time.
>
> Regards,
> Behroz
>
> On Tue, Jan 19, 2016 at 1:59 AM, Edward J. Yoon <edward.y...@samsung.com>
> wrote:
>
>> > Q1: Is Hama going to participate in GSOC 2016 ?
>>
>> Sure, why not?
>>
>> > Q2: In the image below, I see an interesting behavior of Hama but I am
>> not
>> sure why the behavior is like this.
>>
>> Can you tell us what version you used?
>>
>> I roughly guess master task can receive incoming message bundles
>> concurrently
>> if number of tasks is large.
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>> -Original Message-
>> From: Behroz Sikander [mailto:bsikan...@apache.org]
>> Sent: Tuesday, January 19, 2016 12:28 AM
>> To: dev@hama.apache.org
>> Subject: Question regarding Hama synchronization behavior and GSOC
>>
>> Hi,
>> I have 2 questions regarding Hama.
>>
>> Q1: Is Hama going to participate in GSOC 2016 ?
>>
>> Q2: In the image below, I see an interesting behavior of Hama but I am not
>> sure why the behavior is like this.
>>
>> http://imgur.com/cVsfL1x
>>
>> On x-axis, I have the total number of data that I need to process. On
>> y-axis, I have the time in minutes which is aggregated over 200 iterations.
>> Each line in plot represent different number of Hama tasks (Peers) used to
>> process the data. Overall this plot is showing the *total time that master
>> task waits for slave tasks to synchronize (*for* 200 iterations *in*
>> minutes).*
>>
>> Note:
>> 1) total time master waits for slaves in *1* *iteration* = (time of slave
>> processing) +
>> *(time of synchronization)*
>> The plot is only showing the *time in synchronization* aggregated over *200
>> iterations*. I am using this plot to study the time taken by Hama in
>> synchronization.
>>
>> 2) The total data is divided among all the tasks equally. For example, if I
>> am using 10 tasks to process 10K data, then each task will get 1000. If i
>> use 20 tasks to process 10K, then each will have 500.
>>
>> Now in the plot for example, blue line represents 10 tasks. If I process
>> 10,000 files in 200 iterations the master waits for almost 3 minutes for
>> slaves to synchronize.
>>
>> Now if you look closely, then if I *increase* the *number of tasks* to
>> process the data, the *time* of master waiting for *slaves to
>> synchronization* starts to *decrease*. For example, look at the points on
>> 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits
>> for only ~6 minutes and for 50 tasks, it took ~4mins.
>>
>> Q: My question is that how to interpret this information ?
>> The answer that I came up is that the *outgoing message queue* of tasks is
>> smaller in case I use more tasks to process and bigger in case I have less
>> tasks. For example, If a task has to send 1000 messages to master then its
>> outgoing queue will be bigger and will take more time to send as compared
>> to task with 500 outgoing messages. So, is my interpretation correct or
>> something else is going on here ?Any insight would be helpful.
>>
>> Regards,
>> Behroz
>>
>>
>>



-- 
Best Regards, Edward J. Yoon


  1   2   3   4   5   6   7   8   9   10   >