Wiki migration
Old wiki is migrated to Confluence[1]. Given a quick check, that looks ok. But please let me know if anything in terms of this. [1]. https://cwiki.apache.org/HAMA
Kanban
Can't get jira's kanban working, so use [1] temporary. [1]. https://trello.com/b/KRPf8xqS
Re: [DISCUSS] Roadmap
I like the first idea. Was thinking of it as well so when refactoring, trying to further splitting the master, and groom by assigning the role. Would you mind to share a bit more detail on both items? On Fri, 15 Mar 2019 at 09:24, ByungSeok Min wrote: > > Hello everyone. > > How about the item below > 1. Distribued Processing using blockchain > 2. Add Tensorflow. > > Have a nice day^^ > > 2019년 3월 3일 일요일, Chia-Hung Lin 님이 작성: > > > In addition to what's been working on right now. What other tasks you would > > like it to be added? Please share your thought. > >
[DISCUSS] Roadmap
In addition to what's been working on right now. What other tasks you would like it to be added? Please share your thought.
Re: Project healthy question
Sounds good! Although I am slowly refactoring some tasks at my leisure time, it's nice to have other tasks that we can work on. Would you please help start in another thread for discussion or I will start a new one later on? Thanks! On Thu, 28 Feb 2019 at 09:28, ByungSeok Min wrote: > I keep watching hama. > > Although I had things I wanted to develop,i have not been able to > participate actively. > I am going to work AI/Big Data in my company from March. > I will try to do again. > > How about setting up roadmap together? > I think it would be nice to start the 2019 roadmap first > > 2019년 2월 26일 (화) 오후 10:25, Edward J. Yoon 님이 작성: > > > Obviously inactive :/ By the way, I personally seeing that many of apache > > projects are going into inactive state. especially big data related > > projects. > > > > > > 2019년 2월 25일 (월) 오전 12:19, Chia-Hung Lin >님이 > > 작성: > > > > > As you may notice that the activity is low for a period of time. This > > > raises an issue when doing board project with respect to the project > > > healthy. So it's greatly appreciated if anyone has inputs or comments > > > regarding to this. > > > > > >
Re: Project healthy question
This is off topic, but maybe an interesting discussion to read[1]. It looks like cloud provider has impact on big data software. Not sure if this in turn has affected developer's contribution to related projects or not. In addition to that, personally I find it's difficult to find the balance, even attempting to glean from fractional time available. [1]. https://news.ycombinator.com/item?id=18869755 On Tue, 26 Feb 2019 at 13:25, Edward J. Yoon wrote: > Obviously inactive :/ By the way, I personally seeing that many of apache > projects are going into inactive state. especially big data related > projects. > > > 2019년 2월 25일 (월) 오전 12:19, Chia-Hung Lin 님이 > 작성: > > > As you may notice that the activity is low for a period of time. This > > raises an issue when doing board project with respect to the project > > healthy. So it's greatly appreciated if anyone has inputs or comments > > regarding to this. > > >
Project healthy question
As you may notice that the activity is low for a period of time. This raises an issue when doing board project with respect to the project healthy. So it's greatly appreciated if anyone has inputs or comments regarding to this.
Apache Hama git repos migration
According to [1], Apache Hama git repos was migrated to gitbox [2]. [1]. https://issues.apache.org/jira/browse/INFRA-17788 [2]. https://gitbox.apache.org/repos/asf/hama.git
Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!
+1 On Wed, 30 Jan 2019 at 12:47, Júlio Pires wrote: > I'm +1 too. > > Em qua, 30 de jan de 2019 às 06:35, Edward J. Yoon > escreveu: > > > P.S., Please vote on here. We need to make a consensus. > > > > I'm +1. > > > > On Wed, Jan 30, 2019 at 5:33 PM Edward J. Yoon > > wrote: > > > > > > Hi devs, > > > I propose we ask ASF Infra to move the Hama Git repo to GitBox as soon > as > > > the release has been finalized / announced. Once they switch things > over, > > > we can update the web site / documentation to reflect that. > > > > > > Does anyone see any problems with this approach? > > > If there's no objections, I'll create a jira ticket. > > > > > > Thanks. > > > > > > On Thu, Jan 17, 2019 at 4:20 AM Chia-Hung Lin > > > wrote: > > > > > > > > Hi Edward, thanks for help! > > > > > > > > On Tue, 15 Jan 2019 at 13:10, Edward J. Yoon > > wrote: > > > > > > > > > I can check tomorrow! > > > > > > > > > > 2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team < > > > > > infrastruct...@apache.org>님이 작성: > > > > > > > > > > > Hello, hama folks. > > > > > > As stated earlier in 2018, and reiterated two weeks ago, all git > > > > > > repositories must be migrated from the git-wip-us.apache.org URL > > to > > > > > > gitbox.apache.org, as the old service is being decommissioned. > > Your > > > > > > project is receiving this email because you still have > > repositories on > > > > > > git-wip-us that needs to be migrated. > > > > > > > > > > > > The following repositories on git-wip-us belong to your project: > > > > > > - hama.git > > > > > > > > > > > > > > > > > > We are now entering the remaining three weeks of the mandated > > > > > > (coordinated) move stage of the roadmap, and you are asked to > > please > > > > > > coordinate migration with the Apache Infrastructure Team before > > February > > > > > > 7th. All repositories not migrated on February 7th will be mass > > migrated > > > > > > without warning, and we'd appreciate it if we could work together > > to > > > > > > avoid a big mess that day :-). > > > > > > > > > > > > As stated earlier, moving to gitbox means you will get full write > > access > > > > > > on GitHub as well, and be able to close/merge pull requests and > > much > > > > > > more. The move is mandatory for all Apache projects using git. > > > > > > > > > > > > To have your repositories moved, please follow these steps: > > > > > > > > > > > > - Ensure consensus on the move (a link to a lists.apache.org > > thread will > > > > > > suffice for us as evidence). > > > > > > - Create a JIRA ticket at > > https://issues.apache.org/jira/browse/INFRA > > > > > > > > > > > > Your migration should only take a few minutes. If you wish to > > migrate > > > > > > at a specific time of day or date, please do let us know in the > > ticket, > > > > > > otherwise we will migrate at the earliest convenient time. > > > > > > > > > > > > There will be redirects in place from git-wip to gitbox, so > > requests > > > > > > using the old remote origins should still work (however we > > encourage > > > > > > people to update their remotes once migration has completed). > > > > > > > > > > > > As always, we appreciate your understanding and patience as we > move > > > > > > things around and work to provide better services and features > for > > > > > > the Apache Family. > > > > > > > > > > > > Should you wish to contact us with feedback or questions, please > > do so > > > > > > at: us...@infra.apache.org. > > > > > > > > > > > > > > > > > > With regards, > > > > > > Apache Infrastructure > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Best Regards, Edward J. Yoon > > > > > > > > -- > > Best Regards, Edward J. Yoon > > >
Re: [NOTICE] Mandatory migration of git repos to gitbox.apache.org - three weeks left!
Hi Edward, thanks for help! On Tue, 15 Jan 2019 at 13:10, Edward J. Yoon wrote: > I can check tomorrow! > > 2019년 1월 15일 (화) 오후 4:50에 Apache Infrastructure Team < > infrastruct...@apache.org>님이 작성: > > > Hello, hama folks. > > As stated earlier in 2018, and reiterated two weeks ago, all git > > repositories must be migrated from the git-wip-us.apache.org URL to > > gitbox.apache.org, as the old service is being decommissioned. Your > > project is receiving this email because you still have repositories on > > git-wip-us that needs to be migrated. > > > > The following repositories on git-wip-us belong to your project: > > - hama.git > > > > > > We are now entering the remaining three weeks of the mandated > > (coordinated) move stage of the roadmap, and you are asked to please > > coordinate migration with the Apache Infrastructure Team before February > > 7th. All repositories not migrated on February 7th will be mass migrated > > without warning, and we'd appreciate it if we could work together to > > avoid a big mess that day :-). > > > > As stated earlier, moving to gitbox means you will get full write access > > on GitHub as well, and be able to close/merge pull requests and much > > more. The move is mandatory for all Apache projects using git. > > > > To have your repositories moved, please follow these steps: > > > > - Ensure consensus on the move (a link to a lists.apache.org thread will > > suffice for us as evidence). > > - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA > > > > Your migration should only take a few minutes. If you wish to migrate > > at a specific time of day or date, please do let us know in the ticket, > > otherwise we will migrate at the earliest convenient time. > > > > There will be redirects in place from git-wip to gitbox, so requests > > using the old remote origins should still work (however we encourage > > people to update their remotes once migration has completed). > > > > As always, we appreciate your understanding and patience as we move > > things around and work to provide better services and features for > > the Apache Family. > > > > Should you wish to contact us with feedback or questions, please do so > > at: us...@infra.apache.org. > > > > > > With regards, > > Apache Infrastructure > > > > >
Re: [jira] [Commented] (HAMA-1002) Add junit dependency to commons to compile with Hadoop 2.8+
When applying patch Jenkins complains fail integrating patch. Checking TestResult[1], it looks like other tests are waiting master to be up running for use. Anyone knows if there's any way we can re-run the build? [1]. https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/731/testReport/ On 30 December 2017 at 07:39, Hudson (JIRA)wrote: > > [ > https://issues.apache.org/jira/browse/HAMA-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306672#comment-16306672 > ] > > Hudson commented on HAMA-1002: > -- > > FAILURE: Integrated in Jenkins build Hama-Nightly-for-Hadoop-2.x #731 (See > [https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/731/]) > HAMA-1002: Add junit dependency to commons to compile with Hadoop 2.8+ > (ywkim: rev fe57a39cc5a576d11d9137fbae156a79d4d7a8a1) > * (edit) commons/pom.xml > > >> Add junit dependency to commons to compile with Hadoop 2.8+ >> --- >> >> Key: HAMA-1002 >> URL: https://issues.apache.org/jira/browse/HAMA-1002 >> Project: Hama >> Issue Type: Bug >> Components: build >>Affects Versions: 0.7.1 >>Reporter: YoungWoo Kim >> Fix For: 0.7.2 >> >> Attachments: HAMA-1002.0.patch >> >> >> Compilation with Hadoop 2.8+ does not work because transitive dependencies >> for Hadoop have been changed: >> {noformat} >> $ mvn clean package -Phadoop2 -Dhadoop.version=2.8.1 -DskipTests >> (snip) >> [INFO] Apache Hama parent POM . SUCCESS [ 23.797 >> s] >> [INFO] pipes .. SUCCESS [ 22.680 >> s] >> [INFO] commons FAILURE [ 6.662 >> s] >> [INFO] core ... SKIPPED >> [INFO] graph .. SKIPPED >> [INFO] machine learning ... SKIPPED >> [INFO] examples ... SKIPPED >> [INFO] mesos .. SKIPPED >> [INFO] yarn ... SKIPPED >> [INFO] hama-dist .. SKIPPED >> [INFO] >> >> [INFO] BUILD FAILURE >> [INFO] >> >> [INFO] Total time: 53.544 s >> [INFO] Finished at: 2017-12-26T14:55:24+09:00 >> [INFO] Final Memory: 62M/568M >> [INFO] >> >> [ERROR] Failed to execute goal >> org.apache.maven.plugins:maven-compiler-plugin:2.3.2:testCompile >> (default-testCompile) on project hama-commons: Compilation failure: >> Compilation failure: >> [ERROR] >> /Users/ywkim/workspace/hama/commons/src/test/java/org/apache/hama/commons/math/TestDenseDoubleVector.java:[20,23] >> error: package org.junit does not exist >> [ERROR] >> /Users/ywkim/workspace/hama/commons/src/test/java/org/apache/hama/commons/math/TestDenseDoubleVector.java:[20,0] >> error: static import only from classes and interfaces >> [ERROR] >> /Users/ywkim/workspace/hama/commons/src/test/java/org/apache/hama/commons/math/TestDenseDoubleVector.java:[21,23] >> error: package org.junit does not exist >> (snip) >> {noformat} > > > > -- > This message was sent by Atlassian JIRA > (v6.4.14#64029)
Re: [jira] [Created] (HAMA-1002) Add junit dependency to commons to compile with Hadoop 2.8+
I'll look into this, but it may be slow. So feel free to pick this up if the progress appears to be stalled. On Thursday, 28 December 2017, Edward J. Yoonwrote: > Can someone review and commit this patch? :) > > 2017. 12. 26. 오후 5:04에 "ASF GitHub Bot (JIRA)" 님이 작성: > >> >> [ https://issues.apache.org/jira/browse/HAMA-1002?page= >> com.atlassian.jira.plugin.system.issuetabpanels:comment- >> tabpanel=16303626#comment-16303626 ] >> >> ASF GitHub Bot commented on HAMA-1002: >> -- >> >> GitHub user youngwookim opened a pull request: >> >> https://github.com/apache/hama/pull/17 >> >> HAMA-1002: Add junit dependency to commons to compile with Hadoop 2.8+ >> >> >> >> You can merge this pull request into a Git repository by running: >> >> $ git pull https://github.com/youngwookim/hama HAMA-1002 >> >> Alternatively you can review and apply these changes as the patch at: >> >> https://github.com/apache/hama/pull/17.patch >> >> To close this pull request, make a commit to your master/trunk branch >> with (at least) the following in the commit message: >> >> This closes #17 >> >> >> commit fe57a39cc5a576d11d9137fbae156a79d4d7a8a1 >> Author: Youngwoo Kim >> Date: 2017-12-26T06:07:12Z >> >> HAMA-1002: Add junit dependency to commons to compile with Hadoop 2.8+ >> >> >> >> >> > Add junit dependency to commons to compile with Hadoop 2.8+ >> > --- >> > >> > Key: HAMA-1002 >> > URL: https://issues.apache.org/jira/browse/HAMA-1002 >> > Project: Hama >> > Issue Type: Bug >> > Components: build >> >Affects Versions: 0.7.1 >> >Reporter: YoungWoo Kim >> > Fix For: 0.7.2 >> > >> > Attachments: HAMA-1002.0.patch >> > >> > >> > Compilation with Hadoop 2.8+ does not work because transitive >> dependencies for Hadoop have been changed: >> > {noformat} >> > $ mvn clean package -Phadoop2 -Dhadoop.version=2.8.1 -DskipTests >> > (snip) >> > [INFO] Apache Hama parent POM . SUCCESS [ >> 23.797 s] >> > [INFO] pipes .. SUCCESS [ >> 22.680 s] >> > [INFO] commons FAILURE [ >> 6.662 s] >> > [INFO] core ... SKIPPED >> > [INFO] graph .. SKIPPED >> > [INFO] machine learning ... SKIPPED >> > [INFO] examples ... SKIPPED >> > [INFO] mesos .. SKIPPED >> > [INFO] yarn ... SKIPPED >> > [INFO] hama-dist .. SKIPPED >> > [INFO] >> >> > [INFO] BUILD FAILURE >> > [INFO] >> >> > [INFO] Total time: 53.544 s >> > [INFO] Finished at: 2017-12-26T14:55:24+09:00 >> > [INFO] Final Memory: 62M/568M >> > [INFO] >> >> > [ERROR] Failed to execute goal org.apache.maven.plugins: >> maven-compiler-plugin:2.3.2:testCompile (default-testCompile) on project >> hama-commons: Compilation failure: Compilation failure: >> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/ >> apache/hama/commons/math/TestDenseDoubleVector.java:[20,23] error: >> package org.junit does not exist >> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/ >> apache/hama/commons/math/TestDenseDoubleVector.java:[20,0] error: static >> import only from classes and interfaces >> > [ERROR] /Users/ywkim/workspace/hama/commons/src/test/java/org/ >> apache/hama/commons/math/TestDenseDoubleVector.java:[21,23] error: >> package org.junit does not exist >> > (snip) >> > {noformat} >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.4.14#64029) >> >
[DISCUSS] Roadmap
As there are still many areas Hama could be of its use and improved, it might be a good time to open discuss these issues at this moment. Some thoughts I have in mind: - Refactor core package for finer granularity. * Separate io into its own package. * Decouple BSP interface. - Monitor subsystem[1] These could be done incrementally so to reduce the impact of intrusive blocking. Also some issues might be missed here so please feel free to comment. Thanks, and happy new year. [1]. https://issues.apache.org/jira/browse/HAMA-1001
[ANNOUNCE] Hama New PMC Chair - Edward J. Yoon
On behalf of the Apache Hama PMC, I'm pleased to announce that Apache Board has approved the nomination of Edward J. Yoon as Hama's new PMC Chair! Congratulation!
Re: [DISCUSS] Hama releases for each hadoop version
I am +1 with this if no additional issue. On 12 August 2015 at 09:23, Edward J. Yoon edwardy...@apache.org wrote: Any objections/thoughts? On Wed, Jul 22, 2015 at 7:48 PM, Edward J. Yoon edwardy...@apache.org wrote: Hi, Like http://www.apache.org/dist/spark/spark-1.3.1/, should we create release tarball for each hadoop version? Otherwise, user always need to replace manually hadoop jar and some dependency files in ${HAMA_HOME}/lib folder. Of course, src distributions doesn't matter. -- Best Regards, Edward J. Yoon -- Best Regards, Edward J. Yoon
Re: 0.7.1 release plan
+1 Sorry miss this in the mailbox. On 6 August 2015 at 08:57, Edward J. Yoon edwardy...@apache.org wrote: Hey all, As you already might know, file IO bugs of YARN module has reported. I would like to fix this issue and cut a release 0.7.1 ASAP. Rests looks good to me. WDYT? -- Best Regards, Edward J. Yoon
Re: [DISCUSSION] Spinoff ANN package
+1 That looks interesting. I would like to participate in this project. On 5 August 2015 at 11:52, Edward J. Yoon edwardy...@apache.org wrote: Guys, I plan to submit a 'DNN platform on top of Apache Hama' proposal as below. I know Hama community is somewhat small, but the main reason is that this domain-specific project is not fit for Apache Hama community. Recruiting volunteers is also hard problem. I expect this will become a very nice use-case of Apache Hama. If you have any suggestions or other opinions, Please let me know. Also, if you want to participate in this project, Pls feel free to add your name here. Thanks! -- == Abstract == (tentatively named Horn [hɔ:n], korean meaning of Horn is a Spirit) is a neuron-centric programming APIs and execution framework for large-scale deep learning, built on top of Apache Hama. == Proposal == It is a goal of the Horn to provide a neuron-centric programming APIs which allows user to easily define the characteristic of artificial neural network model and its structure, and its execution framework that leverages the heterogeneous resources on Hama and Hadoop YARN cluster. == Background == The initial ANN code was developed at Apache Hama project by a committer, Yexi Jiang (Facebook) in 2013. The motivation behind this work is to build a framework that provides more intuitive programming APIs like Google's MapReduce or Pregel and supports applications needing large model with huge memory consumptions in distributed way. == Rationale == While many of deep learning open source softwares are still data or model parallel only, we aim to support both data and model parallelism and also fault-tolerant system design. The basic idea of data and model parallelism is use of the remote parameter server to parallelize model creation and distribute training across machines, and the BSP framework of Apache Hama for performing asynchronous mini-batches. Within single BSP job, each task group works asynchronously using region barrier synchronization instead of global barrier synchronization, and trains large-scale neural network model using assigned data sets in BSP paradigm. This architecture is inspired by Google's DistBelief (Jeff Dean et al, 2012). == Initial Goals == Some current goals include: * builds new community * provides more intuitive programming APIs * needs both data and model parallelism support * must run natively on both Hama and Hadoop2 * needs also GPUs and InfiniBand support == Current Status == === Meritocracy === The core developers understand what it means to have a process based on meritocracy. We will provide continuous efforts to build an environment that supports this, encouraging community members to contribute. === Community === A small community has formed within the Apache Hama project and some companies such as instant messenger service company and mobile manufacturing company. And many people are interested in the large-scale deep learning platform itself. By bringing Horn into Apache, we believe that the community will grow even bigger. === Core Developers === Edward J. Yoon, Thomas Jungblut, and Dongjin Lee == Known Risks == === Orphaned Products === Apache Hama is already a core open source component at Samsung Electronics, and Horn also will be used by Samsung Electronics, and so there is no direct risk for this project to be orphaned. === Inexperience with Open Source === Some are very new and the others have experience using and/or working on Apache open source projects. === Homogeneous Developers === The initial committers are from different organizations such as, Microsoft, Samsung Electronics, and Line Plus. === Reliance on Salaried Developers === Other developers will also start working on the project in their spare time. === Relationships with Other Apache Products === * Horn is based on Apache Hama * Apache Zookeeper is used for distributed locking service * Natively run on Apache Hadoop and Mesos * Horn can be somewhat overlapped with Singa podling. === An Excessive Fascination with the Apache Brand === Horn itself will hopefully have benefits from Apache, in terms of attracting a community and establishing a solid group of developers, but also the relation with Apache Hama, a general-purpose BSP computing engine. These are the main reasons for us to send this proposal. == Documentation == Initial plan about Horn can be found at http://blog.udanax.org/2015/06/googles-distbelief-clone-project-on.html == Initial Source == The initial source code has been release as part of Apache Hama project developed under Apache Software Foundation. The source code is currently hosted at https://svn.apache.org/repos/asf/hama/trunk/ml/src/main/java/org/apache/hama/ml/ann/ == Cryptography == Not applicable. == Required Resources == Mailing Lists * horn-private * horn-dev Subversion Directory *
Re: [VOTE] Apache Hama 0.7 release (RC3)
When compiling with source checked out from http://svn.apache.org/repos/asf/hama/tags/0.7.0-RC3/, following warnings are thrown, and the build (mvn clean install) fails. That looks like the internal dtd problem. Warning: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. Compiler warnings: WARNING: 'org.apache.xerces.jaxp.SAXParserImpl: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.' Warning: org.apache.xerces.parsers.SAXParser: Feature 'http://javax.xml.XMLConstants/feature/secure-processing' is not recognized. Warning: org.apache.xerces.parsers.SAXParser: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized. Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. [INFO] Rat check: Summary of files. Unapproved: 4 unknown: 4 generated: 0 approved: 567 licence. On 12 June 2015 at 18:26, Martin Illecker millec...@apache.org wrote: +1 2015-06-12 10:53 GMT+02:00 Andronidis Anastasios andronat_...@hotmail.com: +1 Kindly, Anastasios On 12 Jun, 2015, at 09:31, Edward J. Yoon edwardy...@apache.org wrote: We need more vote. Pls check this RC and vote! thanks. On Wed, Jun 10, 2015 at 9:55 PM, ByungSeok Min byeongseok@gmail.com wrote: +1 Best Regards, Byoungseok Min 2015년 6월 10일 수요일, Andronidis Anastasiosandronat_...@hotmail.com님이 작성한 메시지: +1 Cheers, Anastasios Andronidis On 10 Jun, 2015, at 10:43, 김민호 minwise@samsung.com javascript:; wrote: +1 Best regards, Minho Kim -Original Message- From: Edward J. Yoon [mailto:edward.y...@samsung.com javascript:;] Sent: Wednesday, June 10, 2015 8:29 AM To: dev@hama.apache.org javascript:; Subject: RE: [VOTE] Apache Hama 0.7 release (RC3) +1 Everything looks good. -- Best Regards, Edward J. Yoon -Original Message- From: Edward J. Yoon [mailto:edwardy...@apache.org javascript:;] Sent: Tuesday, June 09, 2015 7:39 PM To: dev@hama.apache.org javascript:; Subject: [VOTE] Apache Hama 0.7 release (RC3) Hey all, I just created a 3rd release candidate for the Apache Hama 0.7 release using Java 7. This RC fixes a bug of yarn module. The RC3 is available at: http://people.apache.org/~edwardyoon/dist/0.7.0-RC3/ Tags: http://svn.apache.org/repos/asf/hama/tags/0.7.0-RC3/ Please try it on your environment, run the tests, verify checksum files, etc. and vote. Thanks! -- Best Regards, Edward J. Yoon -- Best Regards, Edward J. Yoon
Re: [VOTE] Move Hama 0.7 to Java 7+ only
+1 On 4 June 2015 at 21:24, Behroz Sikander behro...@gmail.com wrote: +1 On Thu, Jun 4, 2015 at 3:23 PM, 김민호 eorien...@gmail.com wrote: +1 Best Regards, Minho Kim 2015-06-04 20:26 GMT+09:00 Andronidis Anastasios andronat_...@hotmail.com : +1 On 4 Jun, 2015, at 12:38, Martin Illecker millec...@apache.org wrote: +1 2015-06-04 12:11 GMT+02:00 Tommaso Teofili tommaso.teof...@gmail.com : +1 Tommaso 2015-06-04 11:55 GMT+02:00 Edward J. Yoon edwardy...@apache.org: Hello all, I knew that Java 7 isn't fully backwards compatible with Java 6, but I haven't experienced any issues at all so far. Because our code doesn't use any features from Java 7. However, I just noticed that latest Hadoop supports Java 7+ only and we also need to move to Java 7 for supporting YARN. The classic Hama cluster mode is outside the area of influence. Should we move to Java 7? Thanks! [ ] +1, We are ending support for Java 6 and move to Java 7. [ ] -1, Keep support Java 6, because ... -- Best Regards, Edward J. Yoon
Re: Bug in Netty-based RPC
Have you checked limit.conf? From the message it looks like the files opened at underlying system exceed its default limit. On 28 April 2015 at 08:08, Edward J. Yoon edwardy...@apache.org wrote: I tried to run BSP job using netty-based RPC instead of message bundle, but I received too many open files. -- attempt_201504280858_0001_17_0: 15/04/28 08:28:17 INFO ipc.AsyncClient: AsyncClient startup attempt_201504280858_0001_17_0: 15/04/28 08:28:21 ERROR bsp.BSPTask: Error running bsp setup and bsp function. attempt_201504280858_0001_17_0: java.lang.IllegalStateException: failed to create a child event loop attempt_201504280858_0001_17_0: at io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:68) attempt_201504280858_0001_17_0: at io.netty.channel.MultithreadEventLoopGroup.init(MultithreadEventLoopGroup.java:49) attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:61) attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:52) attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:44) attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:36) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncClient$Connection.init(AsyncClient.java:189) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncClient.getConnection(AsyncClient.java:989) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncClient.call(AsyncClient.java:838) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncRPC$Invoker.invoke(AsyncRPC.java:261) attempt_201504280858_0001_17_0: at com.sun.proxy.$Proxy14.getProtocolVersion(Unknown Source) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncRPC.checkVersion(AsyncRPC.java:524) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:509) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:477) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:435) attempt_201504280858_0001_17_0: at org.apache.hama.ipc.AsyncRPC.getProxy(AsyncRPC.java:545) attempt_201504280858_0001_17_0: at org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.getBSPPeerConnection(HamaAsyncMessageManagerImpl.java:155) attempt_201504280858_0001_17_0: at org.apache.hama.bsp.message.HamaAsyncMessageManagerImpl.transfer(HamaAsyncMessageManagerImpl.java:203) attempt_201504280858_0001_17_0: at org.apache.hama.bsp.BSPPeerImpl.sendDirectly(BSPPeerImpl.java:382) attempt_201504280858_0001_17_0: at org.apache.hama.bsp.BSPPeerImpl.send(BSPPeerImpl.java:364) attempt_201504280858_0001_17_0: at org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:467) attempt_201504280858_0001_17_0: at org.apache.hama.graph.GraphJobRunner.setup(GraphJobRunner.java:128) attempt_201504280858_0001_17_0: at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) attempt_201504280858_0001_17_0: at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) attempt_201504280858_0001_17_0: at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) attempt_201504280858_0001_17_0: Caused by: io.netty.channel.ChannelException: failed to open a new selector attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:128) attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoop.init(NioEventLoop.java:120) attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87) attempt_201504280858_0001_17_0: at io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:64) attempt_201504280858_0001_17_0: ... 24 more attempt_201504280858_0001_17_0: Caused by: java.io.IOException: Too many open files attempt_201504280858_0001_17_0: at sun.nio.ch.IOUtil.makePipe(Native Method) attempt_201504280858_0001_17_0: at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:65) attempt_201504280858_0001_17_0: at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36) attempt_201504280858_0001_17_0: at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126) attempt_201504280858_0001_17_0: ... 27 more attempt_201504280858_0001_17_0: 15/04/28 08:28:21 INFO ipc.AsyncServer: AsyncServer gracefully shutdown -- Best Regards, Edward J. Yoon
Re: [DISCUSS] Make Hadoop2 profile default
since HAMA-848 is fixed +1 if no additional issue. On 16 March 2015 at 09:49, Edward J. Yoon edward.y...@samsung.com wrote: Hi all, It seems time to switch our project to Hadoop2 base. My suggestion is that we keep two hadoop1 and hadoop2 profiles but make hadoop2 profile default. WDYT? -- Best Regards, Edward J. Yoon
Offline for holidays
I am going to be offline for 1 ~ 2 weeks for traditional holidays. Thanks
Re: Legal question to include code like com.sun.tools.javac.Main
Thanks for the information. It's very clear. The last question. Scala's licensed under [1], so assuming it's legal to use it's nsc (compiler package) code because its license is not listed in [2] and does not violate Apache's. Am I correct in that assumption? Thanks again for kindly help! [1]. http://www.scala-lang.org/license.html [2]. http://www.apache.org/legal/resolved.html#category-x On 27 January 2015 at 18:43, Mark Thomas ma...@apache.org wrote: On 26/01/2015 09:45, Chia-Hung Lin wrote: Hi, I have a naive question regarding to include the methods like Main.compile in com.sun package in the project source code. For instance, in our project like hama if there is a source file that makes use of com.sun.tools.javac.Main to runtime compile java sources into classes. Is it legal to release or include that source with project? Legally, yes that is fine. You can reference internal JVM vendor classes if you wish. What you may not do is include tools.jar in your distribution since the license for that JAR is not compatible with distribution under the ALv2. Mark - To unsubscribe, e-mail: legal-discuss-unsubscr...@apache.org For additional commands, e-mail: legal-discuss-h...@apache.org
Legal question to include code like com.sun.tools.javac.Main
Hi, I have a naive question regarding to include the methods like Main.compile in com.sun package in the project source code. For instance, in our project like hama if there is a source file that makes use of com.sun.tools.javac.Main to runtime compile java sources into classes. Is it legal to release or include that source with project? Thanks
Git security issue
If you are using git repository, you may need to update git client for security concern. https://github.com/blog/1938-vulnerability-announced-update-your-git-clients http://article.gmane.org/gmane.linux.kernel/1853266
Re: MapWritable is really bad
Should we roll out with our own implementation? Switching to kyro look like will get of this issue, but if we are going to have plugable serialization framework (are we?) that might help those who need it. On 24 September 2014 17:19, Edward J. Yoon edwardy...@apache.org wrote: Interesting .. On Fri, Sep 19, 2014 at 11:52 PM, Andronidis Anastasios andronat_...@hotmail.com wrote: Hello, I remember a discussion rose upon performance issues on messages and that kryo serializer helped a lot. Please read this: http://www.chrisstucchio.com/blog/2011/mapwritable_sometimes_a_performance_hog.html From a custom test I did, I was sending some messages with MapWritable as a container, Text as key and ArrayWritable (with integers inside) as a value. Hama was reporting 20MB of traffic. When I wrote my own Map (that implements Writable interface) I reduced the amount from 20MB to 1.5MB.. Cheers, Anastasios -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: [DISCUSS/VOTE] Refactor of message queue .
Is incoming bundles manager equals to localQueueForNextIteration/ localQueue in 0.6.4 version? My understanding for localQueueForNextIteration/ localQueue is that their role serves as input for the coming superstep; for instance, in the N-th superstep its incoming messages are obtained from localQueue (AbstractMessageManager.getCurrentMessage). In this case it look like safe to save messages to local disk because once a node fails, steps to recover from the previous superstep should be the same without change by placing the messages saved previous to the localQueue again. If not, then probably we need to consider about this issue. On 29 August 2014 08:09, Edward J. Yoon edwardy...@apache.org wrote: First of all, Our main problem is that current system requires a lot of memory space, especially graph module. As you already might know, the main memory consumer is the message queue. To solve this problem, we considered the use of local disk space e.g., DiskQueue and SpillingQueue. However, those queues are basically not able to bundle and group the messages by destination server, in memory-efficient way. So, I don't think this approach is right way. My solution for saving the memory usage and the performance degradation, is storing serializable message objects as a byte array in queue. In graph case, 3X ~ 6X memory efficiency is expected than before (GraphJobMessage consists of destination vertex ID and message value multi-objects). In 0.6.4, Outgoing queue is replaced with outgoing bundles manager, and it showed nice memory improvement. Now I wanna start refactoring of incoming queue. My plan is that adding incoming bundles manager. Bundles can also simply be written to local disk if when memory space is not enough. So, incoming bundles manager can be performed a similar role of DiskQueue and SpillingQueue in the future. If you have any other opinion, Please let me know. If there are no objections, I'll do it. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: Remove Spilling Queue and rewrite checkpoint/recovery
I will make the code stable first before merging it back. On 18 August 2014 17:40, Edward J. Yoon edwardy...@apache.org wrote: Do you have any plan for merging them? This is side opinion. If we want to use Git, now I'm +1. On Sat, Aug 16, 2014 at 12:00 AM, Chia-Hung Lin cli...@googlemail.com wrote: Code right now is at https://github.com/chlin501/hama.git Maven and jdk are required to build the project Command to have a clean build: mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true To test a specific test case: mvn -DskipTests=false -Dtest=TestCaseName test On 15 August 2014 18:21, Suraj Menon menonsur...@gmail.com wrote: Hi Edward, sorry to enter the discussion so late. Bundling and Unbundling of message queue is not Spilling queue's responsibility, it was ended up there to be compatible with the existent implementation of BSP Peer communication. Remember Spilling Queue implementation was done to immediately remove some OutOfMemory issues on sender side first. Spilling Queue gives you a byte array (ByteBuffer) with a batch of serialized messages. This is effectively bundling the messages in byte array (hence the ByteArrayMessageBundle) and sending them for processing. The SpilledDataProcessor's are implemented as a pipeline of processing done using inheritance, something like what we may use trait for in Scala. So if we have a SpilledDataProcessor that sends this bundled message via RPC to the peer, there is no need to write them to file and read them back. As I previously mentioned this was done to be compatible with the existent implementation of peer.send. Also, the async checkpoint recovery code was written before spilling queue. Today we can remove the single message write and do this in before peer sync phase to just write the whole file to HDFS. I would say performance numbers and maintainability comes first and if you think removing spilling queue is a solution go for it. As far as async checkpointing is to be considered, that was a first proof of concept we did and it is high time we move forward from there. Chiahung, do you have some instruction on where and how I can build the scala version of your code? I am really finding it hard to dedicate time for Hama these days. - Suraj On Tue, Aug 12, 2014 at 7:15 AM, Edward J. Yoon edwardy...@apache.org wrote: ChiaHung, Yes, I'm thinking similar things. On Tue, Aug 12, 2014 at 4:11 PM, Chia-Hung Lin cli...@googlemail.com wrote: I am currently working on this part based on the superstep api, similar to the Superstep.java in the trunk. The checkpointer[1] saves bundle message instead of single message. Not very sure if this is what you are looking for? [1]. https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala On 12 August 2014 15:04, Edward J. Yoon edwardy...@apache.org wrote: I think that transferring single messages at a time is not a wise way. Bundle is used to avoid network overheads and contentions. So, if we use Bundle, each processor always sends/receives an bundles. BSPMessageBundle is Writable (and Iterable). And it manages the serialized message as a byte array. If we write an bundles when checkpointing or using Disk-queue, it'll be more simple and faster. In Spilling Queue case, it always requires the process of unbundling and putting messages into queue. On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: -1, can't we first discuss? Also it'd be helpful to be more specific on the problems. Tommaso 2014-08-12 4:25 GMT+02:00 Edward J. Yoon edwardy...@apache.org: All, I'll delete Spilling queue, and rewrite checkpoint/recovery implementation (checkpointing bundles is better than checkpointing all messages). Current implementation is quite mess :/ there are huge deserialization/serialization overheads.. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: Remove Spilling Queue and rewrite checkpoint/recovery
Code right now is at https://github.com/chlin501/hama.git Maven and jdk are required to build the project Command to have a clean build: mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true To test a specific test case: mvn -DskipTests=false -Dtest=TestCaseName test On 15 August 2014 18:21, Suraj Menon menonsur...@gmail.com wrote: Hi Edward, sorry to enter the discussion so late. Bundling and Unbundling of message queue is not Spilling queue's responsibility, it was ended up there to be compatible with the existent implementation of BSP Peer communication. Remember Spilling Queue implementation was done to immediately remove some OutOfMemory issues on sender side first. Spilling Queue gives you a byte array (ByteBuffer) with a batch of serialized messages. This is effectively bundling the messages in byte array (hence the ByteArrayMessageBundle) and sending them for processing. The SpilledDataProcessor's are implemented as a pipeline of processing done using inheritance, something like what we may use trait for in Scala. So if we have a SpilledDataProcessor that sends this bundled message via RPC to the peer, there is no need to write them to file and read them back. As I previously mentioned this was done to be compatible with the existent implementation of peer.send. Also, the async checkpoint recovery code was written before spilling queue. Today we can remove the single message write and do this in before peer sync phase to just write the whole file to HDFS. I would say performance numbers and maintainability comes first and if you think removing spilling queue is a solution go for it. As far as async checkpointing is to be considered, that was a first proof of concept we did and it is high time we move forward from there. Chiahung, do you have some instruction on where and how I can build the scala version of your code? I am really finding it hard to dedicate time for Hama these days. - Suraj On Tue, Aug 12, 2014 at 7:15 AM, Edward J. Yoon edwardy...@apache.org wrote: ChiaHung, Yes, I'm thinking similar things. On Tue, Aug 12, 2014 at 4:11 PM, Chia-Hung Lin cli...@googlemail.com wrote: I am currently working on this part based on the superstep api, similar to the Superstep.java in the trunk. The checkpointer[1] saves bundle message instead of single message. Not very sure if this is what you are looking for? [1]. https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala On 12 August 2014 15:04, Edward J. Yoon edwardy...@apache.org wrote: I think that transferring single messages at a time is not a wise way. Bundle is used to avoid network overheads and contentions. So, if we use Bundle, each processor always sends/receives an bundles. BSPMessageBundle is Writable (and Iterable). And it manages the serialized message as a byte array. If we write an bundles when checkpointing or using Disk-queue, it'll be more simple and faster. In Spilling Queue case, it always requires the process of unbundling and putting messages into queue. On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: -1, can't we first discuss? Also it'd be helpful to be more specific on the problems. Tommaso 2014-08-12 4:25 GMT+02:00 Edward J. Yoon edwardy...@apache.org: All, I'll delete Spilling queue, and rewrite checkpoint/recovery implementation (checkpointing bundles is better than checkpointing all messages). Current implementation is quite mess :/ there are huge deserialization/serialization overheads.. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: Remove Spilling Queue and rewrite checkpoint/recovery
I am currently working on this part based on the superstep api, similar to the Superstep.java in the trunk. The checkpointer[1] saves bundle message instead of single message. Not very sure if this is what you are looking for? [1]. https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala On 12 August 2014 15:04, Edward J. Yoon edwardy...@apache.org wrote: I think that transferring single messages at a time is not a wise way. Bundle is used to avoid network overheads and contentions. So, if we use Bundle, each processor always sends/receives an bundles. BSPMessageBundle is Writable (and Iterable). And it manages the serialized message as a byte array. If we write an bundles when checkpointing or using Disk-queue, it'll be more simple and faster. In Spilling Queue case, it always requires the process of unbundling and putting messages into queue. On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: -1, can't we first discuss? Also it'd be helpful to be more specific on the problems. Tommaso 2014-08-12 4:25 GMT+02:00 Edward J. Yoon edwardy...@apache.org: All, I'll delete Spilling queue, and rewrite checkpoint/recovery implementation (checkpointing bundles is better than checkpointing all messages). Current implementation is quite mess :/ there are huge deserialization/serialization overheads.. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: Questions on Hama
Perhaps you can check ${module-name}/target/surefire-report/ for more detail about which test cases fail. Apache Hama is a BSP engine, meaning it's not only capable of performing graph computation. Instead, it's suitable for general purpose parallel computing as long as the algorithm can be expressed as an iterative application. The benefit of separating Graph Job from BSP Job is that users can perform their tasks without too much restriction. For example, a user not merely can write a program to perform graph computation, but also can write general BSP jobs when it's required. On 10 August 2014 22:15, Dongjin Lee dongjin.lee...@gmail.com wrote: Hi. I am a Hama user, who is analyzing source code now. I have some questions. *1. Build Fail* When I tried build with mvn clean install with version 0.6.4, it succeeded clearly. However, when I tried build with mvn --projects core,examples install, it failed on test task. I think there is something I don't understand yet, but It would be better to update http://wiki.apache.org/hama/HowToContribute, to prevent confusion. I got above command from that wiki page. *2. Why BSPJob and GraphJob is separated?* I read BSPJob GraphJob class, and feel its design is a little bit weird. Why they are separated? Is there any design decision I don't know? Thanks in Advance. - Dongjin -- *Dongjin Lee* *So interested in massive-scale machine learning.facebook: www.facebook.com/dongjin.lee.kr http://www.facebook.com/dongjin.lee.krlinkedin: kr.linkedin.com/in/dongjinleekr http://kr.linkedin.com/in/dongjinleekrgithub: http://goog_969573159github.com/dongjinleekr http://github.com/dongjinleekrtwitter: www.twitter.com/dongjinleekr http://www.twitter.com/dongjinleekr*
Re: [ANN] Welcome to new Hama committers
Congrats to join the community! On 11 August 2014 06:46, Edward J. Yoon edwardy...@apache.org wrote: The Apache Hama PMC is pleased to announce the following additions: * Victor Lee is now a Hama Committer * ByungSeok Min is now a Hama committer. Thanks again to both for their efforts, we hope to see them continue to move Apache Hama forward. Thanks! -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: [ANN] Jeff Fenchel as a new Hama committer
Congratulation, and thanks for the contribution! On 9 June 2014 18:41, Andronidis Anastasios andronat_...@hotmail.com wrote: congrats Jeff! Anastasis On 9 Ιουν 2014, at 11:59 π.μ., Edward J. Yoon edwardy...@apache.org wrote: The Hama PMC is pleased to announce Jeff Fenchel (Mesos module contributor) as a new committer of Apache Hama. Congrats, Jeff Fenchel! -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: Renil Joseph as a new Hama committer
Congratulation! Welcome joining Hama community. On 23 May 2014 14:59, Edward J. Yoon edwardy...@apache.org wrote: The Hama PMC is pleased to announce Renil Joseph as a new committer of Hama. We look forward to his continuing involvement with Hama. Congrats, Renil Joseph! -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: [DISCUSS] Disk Queue and Spilling Queue
Is it going to use rpc? Will it still use the interface, for instance, MessageManager.java? Just to check if any point for integrating with the current ongoing refactoring process. If possible, perhaps decoupling io part and rpc from interface would somehow simplify the integration progress. On 12 May 2014 09:01, Edward J. Yoon edwardy...@apache.org wrote: The old design of outgoing/incoming message queues is readable but it has some problems, and the most performance and memory issues are dependent upon this part. 1) To send a messages to destination Peer, we serialize, compress, and bundle the messages. So, using disk or spilling queue for the outgoing messages is pointless and cause of degradation. This issue SOLVED by HAMA-853. We'll need to add disk-based bundle in the future. 2) Receive-side queue is also the same. Instead of unbundling (and deserializing, decompressing) bundles into {memory, disk, or spilling} queue, we should use bundles in efficient and asynchronous way. If you agree with this, I'll start to refactor the whole queue system. If you have any other ideas e.g., asynchronous message synchronization, Pls let me know. Thanks. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
Re: Refactoring code
The refactoring process is still underway. Is it appropriate to submit something that is wip or may be changed drastically later on? On 30 April 2014 03:12, Tommaso Teofili tommaso.teof...@gmail.com wrote: I do like this approach as well, I'd be curious to try it out myself, maybe in the next weeks. Maybe a chance for a submission to ApacheCon EU? Regards, Tommaso 2014-04-29 17:01 GMT+02:00 Chia-Hung Lin cli...@googlemail.com: The core module are heavily refactored. And will try our best to retain the original interface so that users and other modules won't be impacted too much. Basically the users side will still use Java code because for the core module it's just interfaces/classes definition. Please let me know if anything in terms of this. On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote: Looks like a complete refactor and with a different language ;). I am +1 on moving to Scala. -Suraj On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin cli...@googlemail.com wrote: Hi Recently I am working on the refactoring bsp core module. It's basically related to HAMA-881. There's already some changes. If you have interested, the code is at https://github.com/chlin501
Re: Refactoring code
Oop sorry. The submission mentioned is related to Maybe a chance for a submission to ApacheCon EU? On 30 April 2014 15:10, Tommaso Teofili tommaso.teof...@gmail.com wrote: sure, maybe in a branch, or under a sandbox directory e.g. svn.apache.org/repos/asf/hama/sandbox/core-akka Tommaso 2014-04-30 9:02 GMT+02:00 Chia-Hung Lin cli...@googlemail.com: The refactoring process is still underway. Is it appropriate to submit something that is wip or may be changed drastically later on? On 30 April 2014 03:12, Tommaso Teofili tommaso.teof...@gmail.com wrote: I do like this approach as well, I'd be curious to try it out myself, maybe in the next weeks. Maybe a chance for a submission to ApacheCon EU? Regards, Tommaso 2014-04-29 17:01 GMT+02:00 Chia-Hung Lin cli...@googlemail.com: The core module are heavily refactored. And will try our best to retain the original interface so that users and other modules won't be impacted too much. Basically the users side will still use Java code because for the core module it's just interfaces/classes definition. Please let me know if anything in terms of this. On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote: Looks like a complete refactor and with a different language ;). I am +1 on moving to Scala. -Suraj On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin cli...@googlemail.com wrote: Hi Recently I am working on the refactoring bsp core module. It's basically related to HAMA-881. There's already some changes. If you have interested, the code is at https://github.com/chlin501
Re: Refactoring code
Get it. Thanks! On 30 April 2014 15:23, Tommaso Teofili tommaso.teof...@gmail.com wrote: sure, I think work in progress as an effort to innovate and improve our current architecture and performance would be a great one, and I've seen wip talks in the past. Tommaso 2014-04-30 9:19 GMT+02:00 Chia-Hung Lin cli...@googlemail.com: Oop sorry. The submission mentioned is related to Maybe a chance for a submission to ApacheCon EU? On 30 April 2014 15:10, Tommaso Teofili tommaso.teof...@gmail.com wrote: sure, maybe in a branch, or under a sandbox directory e.g. svn.apache.org/repos/asf/hama/sandbox/core-akka Tommaso 2014-04-30 9:02 GMT+02:00 Chia-Hung Lin cli...@googlemail.com: The refactoring process is still underway. Is it appropriate to submit something that is wip or may be changed drastically later on? On 30 April 2014 03:12, Tommaso Teofili tommaso.teof...@gmail.com wrote: I do like this approach as well, I'd be curious to try it out myself, maybe in the next weeks. Maybe a chance for a submission to ApacheCon EU? Regards, Tommaso 2014-04-29 17:01 GMT+02:00 Chia-Hung Lin cli...@googlemail.com: The core module are heavily refactored. And will try our best to retain the original interface so that users and other modules won't be impacted too much. Basically the users side will still use Java code because for the core module it's just interfaces/classes definition. Please let me know if anything in terms of this. On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote: Looks like a complete refactor and with a different language ;). I am +1 on moving to Scala. -Suraj On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin cli...@googlemail.com wrote: Hi Recently I am working on the refactoring bsp core module. It's basically related to HAMA-881. There's already some changes. If you have interested, the code is at https://github.com/chlin501
Re: Refactoring code
The core module are heavily refactored. And will try our best to retain the original interface so that users and other modules won't be impacted too much. Basically the users side will still use Java code because for the core module it's just interfaces/classes definition. Please let me know if anything in terms of this. On 29 April 2014 22:24, Suraj Menon menonsur...@gmail.com wrote: Looks like a complete refactor and with a different language ;). I am +1 on moving to Scala. -Suraj On Tue, Apr 29, 2014 at 6:19 AM, Chia-Hung Lin cli...@googlemail.comwrote: Hi Recently I am working on the refactoring bsp core module. It's basically related to HAMA-881. There's already some changes. If you have interested, the code is at https://github.com/chlin501
Open ssl security issue
Due to open ssl vulnerabilities[1], it might be good for committer, etc. to reset their Apache password as mentioned in [2]. [1]. http://www.openssl.org/news/vulnerabilities.html [2]. https://blogs.apache.org/infra/entry/heartbleed_fallout_for_apache
Re: [DISCUSS] Rename of ML and Graph modules
Graph package is clearer for me. Mach may be confusing with CMU's os microkernel. Or we want a code name for each release like some GNU/ Linux dist release? On 14 April 2014 14:57, Tommaso Teofili tommaso.teof...@gmail.com wrote: I think graph is a pretty fine name, it'easy to understand it's Hama applied to graphs, for 'ml' maybe it's a bit too short so ml may mean anything even if I don't think 'mach' improves that. Regards, Tommaso 2014-04-13 17:01 GMT+02:00 Yexi Jiang yexiji...@gmail.com: It seems that the old names are better than the new names. 2014-04-13 8:10 GMT-04:00 Andronidis Anastasios andronat_...@hotmail.com : hi, sorry but i don't understand what the new names mean. what is a b-graph? mach? kindly, anastasis On 13 Απρ 2014, at 1:54 μ.μ., Edward J. Yoon edwardy...@apache.org wrote: Because they are too ambiguous and unmemorable. On Sun, Apr 13, 2014 at 8:25 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: why? Tommaso 2014-04-13 12:57 GMT+02:00 Edward J. Yoon edwardy...@apache.org: I propose that we rename the graph and ml modules: 1. Hama Graph - B-Graph 2. Hama ML - Mach WDYT? -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer Co., Ltd. -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer Co., Ltd. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
In that case are we going to organize multiple tasks into a group? A job has N bsp groups (bsp task in current code), in turn each group contain multiple tasks (and all tasks are on the same server)? If this is the case, how do they send messages or communicate between groups? group to group? A task (within a group) can arbitrary send the messages? I have this question because this would have implication on FT. IIRC Storm is a CEP framework, and messages can be sent arbitrary to every bolt. The issue with such computation is that it's not a simple task when performing checkpoint. Generally it's done through communication induced checkpointing. Otherwise like storm they ack and redo each message when necessary; an option is something like batch (in storm like trident batch processing if I am correct) transactional processing. What I can think of right now is, with current structure, grouping every N messages a superstep, and then asynchronously checkpointing, which may be similar to trident batch processing. I understand it's still far away based on the current status. I suppose it's good if we can take that into consideration beforehand as well. On 11 April 2014 13:40, Edward J. Yoon edwardy...@apache.org wrote: Yesterday, I had survey the Storm. Storm's task grouping and chainable bolts seems pretty nice (especially, chainable bolts can be really useful in case of real-time join operation). I think, we can also implement similar functions of Storm's task grouping and chainable bolts on BSP. My rough idea is: 1. Launches multi-tasks per node (as number of group of Bolts). For example: +---+ |Server1| +---+ Task-1. tailing bolt Task-2. split sentence bolt Task-3. wordcount bolt 2. Assign the tasks to proper group. -- 3. Each task executes their user-defined function and sends messages to task of next group. 4. Synchronizes all. -- 5. Finally, repeat the above 3 ~ 4 process. In here, only the difficult one is how to determine the task group at initial superstep. So, I'd like to add below one to BSPPeer interface. /** * @return the names of locally adjacent peers (including this peer). */ public String[] getAdjacentPeerNames(); On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang yexiji...@gmail.com wrote: great~ 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958430#comment-13958430] Edward J. Yoon commented on HAMA-883: - NOTE: my fellow worker is currently working on this issue - https://github.com/garudakang/meerkat [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer Co., Ltd.
Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
No problem. It's a good discussion so we can examine and improve accordingly. I am still not very sure about the topology, or how tasks are grouped. From description, it seems looks as the link below: http://i.imgur.com/92L2XY1.png Each GroomServer is viewed as a group, and each group will launch 3 tasks by default (as default xml defined). So the corresponded messages, emitted from source like queue, is sent to each group for consumption? And how do task communicate between groups/ tasks? On 11 April 2014 16:43, Edward J. Yoon edw...@datasayer.com wrote: My rough idea assumes that dedicated Hama is installed on machines that generates logs, and the number of child tasks will be launched equally per GroomServer. So, if the groups == 3, framework launches 3 tasks per node. At first superstep, one task broadcasts the Topology after grouping the Tasks into 3 groups. == Group1 == server1:60001 server2:60001 server3:60001 == Group2 == server1:60002 server2:60002 server3:60002 == Group3 == server1:60003 server2:60003 server3:60003 Based on this Topolgy, tasks reflects proper class and executes it. Then, it'll work like Storm flow. I didn't think about FT issue yet. :-) On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin cli...@googlemail.comwrote: Or we can have POC first and then see how it relates to the issue we might need to fix. On 11 April 2014 16:10, Chia-Hung Lin cli...@googlemail.com wrote: In that case are we going to organize multiple tasks into a group? A job has N bsp groups (bsp task in current code), in turn each group contain multiple tasks (and all tasks are on the same server)? If this is the case, how do they send messages or communicate between groups? group to group? A task (within a group) can arbitrary send the messages? I have this question because this would have implication on FT. IIRC Storm is a CEP framework, and messages can be sent arbitrary to every bolt. The issue with such computation is that it's not a simple task when performing checkpoint. Generally it's done through communication induced checkpointing. Otherwise like storm they ack and redo each message when necessary; an option is something like batch (in storm like trident batch processing if I am correct) transactional processing. What I can think of right now is, with current structure, grouping every N messages a superstep, and then asynchronously checkpointing, which may be similar to trident batch processing. I understand it's still far away based on the current status. I suppose it's good if we can take that into consideration beforehand as well. On 11 April 2014 13:40, Edward J. Yoon edwardy...@apache.org wrote: Yesterday, I had survey the Storm. Storm's task grouping and chainable bolts seems pretty nice (especially, chainable bolts can be really useful in case of real-time join operation). I think, we can also implement similar functions of Storm's task grouping and chainable bolts on BSP. My rough idea is: 1. Launches multi-tasks per node (as number of group of Bolts). For example: +---+ |Server1| +---+ Task-1. tailing bolt Task-2. split sentence bolt Task-3. wordcount bolt 2. Assign the tasks to proper group. -- 3. Each task executes their user-defined function and sends messages to task of next group. 4. Synchronizes all. -- 5. Finally, repeat the above 3 ~ 4 process. In here, only the difficult one is how to determine the task group at initial superstep. So, I'd like to add below one to BSPPeer interface. /** * @return the names of locally adjacent peers (including this peer). */ public String[] getAdjacentPeerNames(); On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang yexiji...@gmail.com wrote: great~ 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958430#comment-13958430 ] Edward J. Yoon commented on HAMA-883: - NOTE: my fellow worker is currently working on this issue - https://github.com/garudakang/meerkat [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer
Re: I'd like to contribute to Hama
Hi You can have a look at jira [1]. Checking out the source [2]. And submit patch as attachment to a jira ticket. Welcome for contribution. : ) [1]. https://issues.apache.org/jira/browse/HAMA/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel [2]. http://wiki.apache.org/hama/HowToContribute On 9 April 2014 13:23, Byeong Seok Min byeongseok@gmail.com wrote: Hello Hama
Re: [DISCUSS] Fault tolerant BSP job
Sorry don't catch the point. What's difference between pure BSP and FT BSP? Any concrete example? On 9 April 2014 08:29, Edward J. Yoon edwardy...@apache.org wrote: In my eyes, SuperstepPiEstimator[1] look like totally new programming model, very similar with Pregel. I personally would like to suggest that we provide both pure BSP and fault tolerant BSP model, instead of replace. 1. http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
Re: [DISCUSS] Fault tolerant BSP job
Not very sure if we sync at the same page. And sorry I am not very familiar with Superstep implementation. I assume that traditional bsp model means the original bsp interface where there is a bsp function and user can freely call peer.sync(), etc. methods bsp(BSPPeer ... peer) { // whatever computation peer.sync(); } And the superstep style is with Superstep abstract class. If this is the case, SuperstepBSP.java has already call sync, as below, outside each Superstep.compute(). So it looks like even SuperstepPiEstimator doesn't call sync() method, barrier sync will be executed because each Superstep is viewed as a superstep in original BSP definition. @Override public void bsp(BSPPeerK1, V1, K2, V2, M peer) throws IOException, SyncException, InterruptedException { for (int index = startSuperstep; index supersteps.length; index++) { SuperstepK1, V1, K2, V2, M superstep = supersteps[index]; superstep.compute(peer); if (superstep.haltComputation(peer)) { break; } peer.sync(); startSuperstep = 0; } } Within the Superstep.compute(), if sync is called again, I would think that another barrier sync will be executed. SuperstepBSP.java for(...) { superstep .compute() - { // in compute method ... peer.sync() } ... peer.sync() } IIRC each call to sync may raise the checkpoint (no recovery) method serialize message to hdfs. For SerializePrinting, following code snippet may move for (String otherPeer : bspPeer.getAllPeerNames()) { bspPeer.send(otherPeer, new IntegerMessage(bspPeer.getPeerName(), i)); } to Superstep.compute() And the outer for loop is what is programmed in SuperstepBSP.java for (int i = 0; i NUM_SUPERSTEPS; i++) { // code that should be moved to Superstep.compute() } bspPeer.sync(); On 9 April 2014 16:17, Edward J. Yoon edwardy...@apache.org wrote: As you can see here[1], the sync() method never called, and an classes of all superstars were needed to be declared within Job configuration. Therefore, I thought it's similar with Pregel style on BSP model. It's quite different from legacy model in my eyes. According to HAMA-505, superstep API seems used for FT job processing (I didn't read closely yet). Right? In here, I have an questions. What happens if I call the sync() method within compute() method? In this case, framework guarantees the checkpoint/recovery? And how can I implement the http://wiki.apache.org/hama/SerializePrinting using superstep API? What's difference between pure BSP and FT BSP? Any concrete example? I was mean the traditional BSP programming model. 1. http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java On Wed, Apr 9, 2014 at 4:25 PM, Chia-Hung Lin cli...@googlemail.com wrote: Sorry don't catch the point. What's difference between pure BSP and FT BSP? Any concrete example? On 9 April 2014 08:29, Edward J. Yoon edwardy...@apache.org wrote: In my eyes, SuperstepPiEstimator[1] look like totally new programming model, very similar with Pregel. I personally would like to suggest that we provide both pure BSP and fault tolerant BSP model, instead of replace. 1. http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- Edward J. Yoon (@eddieyoon) CEO at DataSayer Co., Ltd.
Re: [DISCUSS] Fault tolerant BSP job
That's why I proposed to use Superstep api instead, though I prefer plain bsp function. Unless we want to instrument the source code, which I believe is not what we, including users, want. With Superstep api we can resume the message from the latest (the new refactored code should base on this as well) checkpointed message, under some precondition. Alternative we can implement our own code (not Java or probably in Java 8) to perform checkpoint, but that would take very long time in accomplishing those tasks. I would put that issue in the future roadmap because personally I perform plain bsp function instead of Superstep. On 9 April 2014 23:56, Suraj Menon surajsme...@apache.org wrote: I don't like my patch in HAMA-639 myself, eventhough I believe it satisfies all the mentioned requirements. The usage of superstep chaining API implementation in the patch is too complicated. A superstep here is like a transformation function you define on an RDD in Spark. So if you look into FT design of Spark, on failure, they rerun the operations on the RDD to get to the current state. This is similar to what we have in mind using checkpointing. The challenge is in getting the same messages replayed to newly spawned task on checkpointed data. If you don't use the Superstep(or any other abstraction representing a function) you cannot start processing from a line of code where the failure occurred. (Java does not support goto line number.) -Suraj On Wed, Apr 9, 2014 at 7:29 AM, Edward J. Yoon edwardy...@apache.orgwrote: I just found this: https://issues.apache.org/jira/browse/HAMA-503 and HAMA-639. Do you still think superstep API is essential for checkpoint/recovery? If not, we can drop it. I don't think it's good idea. On Wed, Apr 9, 2014 at 7:43 PM, Chia-Hung Lin cli...@googlemail.com wrote: Not very sure if we sync at the same page. And sorry I am not very familiar with Superstep implementation. I assume that traditional bsp model means the original bsp interface where there is a bsp function and user can freely call peer.sync(), etc. methods bsp(BSPPeer ... peer) { // whatever computation peer.sync(); } And the superstep style is with Superstep abstract class. If this is the case, SuperstepBSP.java has already call sync, as below, outside each Superstep.compute(). So it looks like even SuperstepPiEstimator doesn't call sync() method, barrier sync will be executed because each Superstep is viewed as a superstep in original BSP definition. @Override public void bsp(BSPPeerK1, V1, K2, V2, M peer) throws IOException, SyncException, InterruptedException { for (int index = startSuperstep; index supersteps.length; index++) { SuperstepK1, V1, K2, V2, M superstep = supersteps[index]; superstep.compute(peer); if (superstep.haltComputation(peer)) { break; } peer.sync(); startSuperstep = 0; } } Within the Superstep.compute(), if sync is called again, I would think that another barrier sync will be executed. SuperstepBSP.java for(...) { superstep .compute() - { // in compute method ... peer.sync() } ... peer.sync() } IIRC each call to sync may raise the checkpoint (no recovery) method serialize message to hdfs. For SerializePrinting, following code snippet may move for (String otherPeer : bspPeer.getAllPeerNames()) { bspPeer.send(otherPeer, new IntegerMessage(bspPeer.getPeerName(), i)); } to Superstep.compute() And the outer for loop is what is programmed in SuperstepBSP.java for (int i = 0; i NUM_SUPERSTEPS; i++) { // code that should be moved to Superstep.compute() } bspPeer.sync(); On 9 April 2014 16:17, Edward J. Yoon edwardy...@apache.org wrote: As you can see here[1], the sync() method never called, and an classes of all superstars were needed to be declared within Job configuration. Therefore, I thought it's similar with Pregel style on BSP model. It's quite different from legacy model in my eyes. According to HAMA-505, superstep API seems used for FT job processing (I didn't read closely yet). Right? In here, I have an questions. What happens if I call the sync() method within compute() method? In this case, framework guarantees the checkpoint/recovery? And how can I implement the http://wiki.apache.org/hama/SerializePrinting using superstep API? What's difference between pure BSP and FT BSP? Any concrete example? I was mean the traditional BSP programming model. 1. http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java On Wed, Apr 9, 2014 at 4:25 PM, Chia-Hung Lin cli...@googlemail.com wrote: Sorry don't catch the point. What's difference between pure BSP and FT BSP? Any concrete example? On 9 April 2014 08:29, Edward J. Yoon edwardy
Re: Project and Website Layout Refactoring Idea.
+1 On 4 April 2014 16:41, Andronidis Anastasios andronat_...@hotmail.com wrote: +1 Anastasis On 4 Απρ 2014, at 8:22 π.μ., Tommaso Teofili tommaso.teof...@gmail.com wrote: it sounds reasonable to me, good point. Tommaso 2014-04-04 3:31 GMT+02:00 Edward J. Yoon edwardy...@apache.org: All, From an user's perspective, we offers too many complex things, so user feels difficulty in understanding the how to use of Apache Hama. Hence, I propose that we separate the Hama into multi-projects (logically). For example: * Main portal: http://hama.apache.org/ * Core BSP framework project: http://hama.apache.org/bsp/ * Pregel-like Graph framework project: http://hama.apache.org/graph/ * BSP-based Machine Learning Library project http://hama.apache.org/ml/ And, for each of projects, we also document how to use separately. . What do you think? -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
Procedure
As we will have new version for each release, we need someone help deal with release process. Anyone would like to volunteer on this?
Re: Procedure
Thank for volunteer. On 27 March 2014 14:50, Edward J. Yoon edwardy...@apache.org wrote: I can volunteer to release manager. On Thu, Mar 27, 2014 at 3:24 PM, Chia-Hung Lin cli...@googlemail.com wrote: As we will have new version for each release, we need someone help deal with release process. Anyone would like to volunteer on this? -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
NPE is thrown when generating javadoc
I am not sure if this is jdk bug (maybe not). Just write down as side note. When compiling with jdk 6 (java version 1.6.0_25) following exception is thrown. Switching to jdk 7, this api doc exception goes away. Came across to find on the interest there is a same problem report, but no bug database link is provided. [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar (default) on project hama-core: MavenReportException: Error while creating archive: [ERROR] Exit code: 1 - java.lang.NullPointerException [ERROR] at com.sun.tools.javac.jvm.ClassReader.findMethod(ClassReader.java:974) [ERROR] at com.sun.tools.javac.jvm.ClassReader.readEnclosingMethodAttr(ClassReader.java:926) [ERROR] at com.sun.tools.javac.jvm.ClassReader.readMemberAttr(ClassReader.java:909) [ERROR] at com.sun.tools.javac.jvm.ClassReader.readClassAttr(ClassReader.java:1053) [ERROR] at com.sun.tools.javac.jvm.ClassReader.readClassAttrs(ClassReader.java:1067) [ERROR] at com.sun.tools.javac.jvm.ClassReader.readClass(ClassReader.java:1560) [ERROR] at com.sun.tools.javac.jvm.ClassReader.readClassFile(ClassReader.java:1658) [ERROR] at com.sun.tools.javac.jvm.ClassReader.fillIn(ClassReader.java:1845) [ERROR] at com.sun.tools.javac.jvm.ClassReader.complete(ClassReader.java:1777) [ERROR] at com.sun.tools.javac.code.Symbol.complete(Symbol.java:386) [ERROR] at com.sun.tools.javac.code.Symbol$ClassSymbol.complete(Symbol.java:763) [ERROR] at com.sun.tools.javac.code.Symbol$ClassSymbol.flags(Symbol.java:695) [ERROR] at com.sun.tools.javadoc.ClassDocImpl.getFlags(ClassDocImpl.java:105) [ERROR] at com.sun.tools.javadoc.ClassDocImpl.isAnnotationType(ClassDocImpl.java:116) [ERROR] at com.sun.tools.javadoc.DocEnv.isAnnotationType(DocEnv.java:574) [ERROR] at com.sun.tools.javadoc.DocEnv.getClassDoc(DocEnv.java:546) [ERROR] at com.sun.tools.javadoc.PackageDocImpl.getClasses(PackageDocImpl.java:154) [ERROR] at com.sun.tools.javadoc.PackageDocImpl.addAllClassesTo(PackageDocImpl.java:170) [ERROR] at com.sun.tools.javadoc.RootDocImpl.classes(RootDocImpl.java:178) [ERROR] at com.sun.tools.doclets.internal.toolkit.AbstractDoclet.startGeneration(AbstractDoclet.java:96) [ERROR] at com.sun.tools.doclets.internal.toolkit.AbstractDoclet.start(AbstractDoclet.java:64) [ERROR] at com.sun.tools.doclets.formats.html.HtmlDoclet.start(HtmlDoclet.java:42) [ERROR] at com.sun.tools.doclets.standard.Standard.start(Standard.java:23) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [ERROR] at java.lang.reflect.Method.invoke(Method.java:597) [ERROR] at com.sun.tools.javadoc.DocletInvoker.invoke(DocletInvoker.java:269) [ERROR] at com.sun.tools.javadoc.DocletInvoker.start(DocletInvoker.java:143) [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:340) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:128) [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:41) [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:31) [ERROR]
Re: [DISCUSS] Close all old JIRA tickets and Redesign for future fault tolerant system
I am refactoring the code. If you are interested, it's at https://github.com/chlin501/hama On 7 March 2014 10:12, Edward J. Yoon edwardy...@apache.org wrote: I'd like to close all old (very inactive) JIRA tickets, and re-design for future system. If no objections are raised, I'll do next week. Thanks! -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
BSP is a bridge model that doesn't restrict itself to some particular usage. My understanding (I could be wrong) is that our framework needs to address such issue. [1], for example, proposes a solution based on bsp in the field of real-time application. [1]. Hartley J.K., Bargiela A., TPML: Parallel meta-language for scientific and engineering computations using transputers (TPML), Proc. of 2nd Int. Conf. on Software for Supercomputers and Multiprocessors, SMS'94, 1994, pp. 22-31 On 4 March 2014 21:20, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
I used Twitter Storm previously. Storm is an excellent framework in real time processing. Considering Hama in real time tasks, the framework in my opinion need to decouple io from hdfs so that the source/ input is not restricted to just hdfs. On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
Below is just my personal viewpoint. We can refactor bsp to be more modularized so that people can choose if that fits their requirement. Basically bsp is a generalized model, it may be good if we can create a flexible framework. On 5 March 2014 12:25, Edward J. Yoon edwardy...@apache.org wrote: Why not? Sent from my iPhone On 2014. 3. 5., at 오후 1:09, Yexi Jiang yexiji...@gmail.com wrote: Yes, currently Hama does not support streaming input and streaming output. That's why currently it is not a natural choice for people with real time computing needs. Do we really need to make Hama to support the real time computing? In that case, we need to compete with Storm... 2014-03-04 22:58 GMT-05:00 Chia-Hung Lin cli...@googlemail.com: I used Twitter Storm previously. Storm is an excellent framework in real time processing. Considering Hama in real time tasks, the framework in my opinion need to decouple io from hdfs so that the source/ input is not restricted to just hdfs. On 5 March 2014 09:30, Yexi Jiang yexiji...@gmail.com wrote: Please correct me if I'm wrong. My understanding of aggregating the log is the collect the generated from each monitored machine in real time. The collecting procedure is continuous like a data stream and never end. I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate the logs incrementally each day), but I cannot immediately make up an idea of using Hama to solve this problem in real time approach. 2014-03-04 19:32 GMT-05:00 Edward J. Yoon edwardy...@apache.org: Aggregators of Graph package are doing similar wok. Monitoring and Global communication, ..., etc. On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang yexiji...@gmail.com wrote: I am very interested in this topic since my research area includes event mining, but can BSP conducts the real time computing? I once used the message queue based solution to collect the event logs. 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-883: Summary: [Research Task] Massive log event aggregation in real time using Apache Hama (was: [Research Task] Massive log data aggregation in real time using Apache Hama) [Research Task] Massive log event aggregation in real time using Apache Hama Key: HAMA-883 URL: https://issues.apache.org/jira/browse/HAMA-883 Project: Hama Issue Type: Task Reporter: Edward J. Yoon BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing. -- This message was sent by Atlassian JIRA (v6.2#6252) -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Cutting a 0.7 release
Just let you know I may refactor based on the following diagram. http://people.apache.org/~chl501/diagram1.png That sketches the basic flow required for ft. I am currently evaluate related parts, so it's subjected to change. On 24 February 2014 20:52, Edward J. Yoon edwardy...@apache.org wrote: 0.6.4 or 0.7.0, Both are OK to me. Just FYI, The memory efficiency has been significantly (almost x2-3) improved by runtime message serialization and compression. See https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3 (I'll attach more benchmarks and comparisons with other systems result soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork, SemiClustering, Graph's Combiners HAMA-857. According to my personal evaluations, current system is fairly respectable. As I mentioned before, I believe we should stick to in-memory style since the today's machines can be equipped with up to 128 GB. Disk (or disk hybrid) based queue is a optional, not a must-have. Once we release this one, we finally might want to focus on below issues: * Fault tolerant job processing (checkpoint recovery) * Support GPUs and InfiniBand Then, I think we can release version 1.0. On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Would you cut 0.7 or 0.6.4 ? I'd go with 0.6.4 as I think the next minor version change should be due to significant feature additions / changes and / or stability / scalability improvements. Regards, Tommaso 2014-02-24 8:47 GMT+01:00 Edward J. Yoon edwardy...@apache.org: Hi all, I plan on cutting a release next week. If you have some opinions, Pls feel free to comment here. Sent from my iPhone -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
Re: Cutting a 0.7 release
Programmer can't control java memory like malloc/ free in c, type boxing/ unboxing, etc., it seems not be easy to evaluate the memory. So it would be good sticking to erlang fail fast style. Or we can have a programme that load data and measure the actual memory usage. On 24 February 2014 22:32, Tommaso Teofili tommaso.teof...@gmail.com wrote: 2014-02-24 13:52 GMT+01:00 Edward J. Yoon edwardy...@apache.org: 0.6.4 or 0.7.0, Both are OK to me. Just FYI, The memory efficiency has been significantly (almost x2-3) improved by runtime message serialization and compression. See https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3 (I'll attach more benchmarks and comparisons with other systems result soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork, SemiClustering, Graph's Combiners HAMA-857. sure, all the above things look good to me. According to my personal evaluations, current system is fairly respectable. As I mentioned before, I believe we should stick to in-memory style since the today's machines can be equipped with up to 128 GB. Disk (or disk hybrid) based queue is a optional, not a must-have. right, the only thing that I think we need to address before 0.7.0 is related to the OutOfMemory errors (especially when dealing with large graphs); for example IMHO even if the memory is not enough to store all the graph vertices assigned to a certain peer, a scalable system should never throw OOM exceptions, instead it may eventually process items slower (with caches / queues) but never throw an exception for that but that's just my opinion. Once we release this one, we finally might want to focus on below issues: * Fault tolerant job processing (checkpoint recovery) +1 * Support GPUs and InfiniBand +1 for the former, not sure about the latter. Then, I think we can release version 1.0. My 2 cents, Tommaso On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Would you cut 0.7 or 0.6.4 ? I'd go with 0.6.4 as I think the next minor version change should be due to significant feature additions / changes and / or stability / scalability improvements. Regards, Tommaso 2014-02-24 8:47 GMT+01:00 Edward J. Yoon edwardy...@apache.org: Hi all, I plan on cutting a release next week. If you have some opinions, Pls feel free to comment here. Sent from my iPhone -- Edward J. Yoon (@eddieyoon) Chief Executive Officer DataSayer, Inc.
Re: New logo on website
+ one hama On 20 January 2014 20:18, Martin Illecker mar...@illecker.at wrote: Nice ;-) 2014/1/20 Edward J. Yoon edwardy...@apache.org Hi, http://people.apache.org/~edwardyoon/site/index.html Do you like this new logo? If no objections arise, I'd like to commit this! -- Best Regards, Edward J. Yoon @eddieyoon
Re: FYI, Comparison and Evaluation of Open Source Implementations of Pregel and Related Systems
Not very sure, but it seems JUnitBenchmarks can be integrated to Jekins. On 13 January 2014 17:05, Tommaso Teofili tommaso.teof...@gmail.com wrote: Thanks Song Bai and Ed for your replies, looking forward to Song's contributions and HAMA-843/816 to be done. Tommaso p.s.: I think we need a way of continuously benchmarking our trunk (e.g. setup 2+ machines in distributed mode and run tests / benchmarks against them via Jenkins, but I don't know if that's really feasible via ASF Jenkins). 2014/1/13 Edward J. Yoon edwardy...@apache.org Once HAMA-843 is committed, PageRank performance will be dramatically improved. The scalability issue is related with In-Memory VerticesInfo and Queue. DiskVerticesInfo is now available. Disk/Spilling Queue issues will be fixed soon. And also, Graph package's performance can be improved one more time with HAMA-816. On Mon, Jan 13, 2014 at 1:14 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: by the way: is there anyone aware of what kind of failures were related to PageRank failures highlighted in the mentioned slides (or know who can we ask)? Tommaso 2014/1/10 Edward J. Yoon edwardy...@apache.org Just FYI, https://cs.uwaterloo.ca/~kdaudjee/courses/cs848/slides/proj/F13/JPV.pdf -- Best Regards, Edward J. Yoon @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
Re: Hama Scheduler
BSPMaster makes use of TaskScheduler for scheduling tasks BSPMaster.java Class? extends TaskScheduler schedulerClass = conf.getClass( bsp.master.taskscheduler, SimpleTaskScheduler.class, TaskScheduler.class); this.taskScheduler = ReflectionUtils.newInstance(schedulerClass, conf); Then in SimpleTaskScheduler, tasks are scheduled through schedule function. And tasks are obtained by related JobInProgress. That's roughly the execution path. IIRC the split size is more related to a single job so increasing split size may not allow more jobs to be ran on the same cluster. At the moment the scheduling mechanism is done by creating a task per GroomServer, and each GroomServer allows default maxTasks up to 3. So increasing maxTasks may be a way to increase job running concurrently; or restricting tasks scheduled to GroomServer, and then scheduling tasks in the new job to free slots may also help increase the concurrent job execution. Right now scheduling is just a simple FCFS. It's welcome the improvment on adding something like policy so that the scheduling mechanism is more flexible. On 24 December 2013 20:06, Yuesheng Hu yueshen...@gmail.com wrote: Hi, I want to implement a scheduler for hama as my thesis project. Here are my ideas: 1. make the split size as big as possible, so more jobs can be run on the cluster; 2. the scheduler can select a job based on the job's type, like graph job (message intensive) or iterative job Any advice? Best Regards!
Re: Hama book
I am happy to help. On 26 November 2013 22:09, Tommaso Teofili tommaso.teof...@gmail.com wrote: looks like a community written book, nice :) p.s.: I can help as well 2013/11/26 Yexi Jiang yexiji...@gmail.com Me too, that's pretty interesting. 2013/11/26 Suraj Menon menonsur...@gmail.com I can help too. What is the timeline? On Tue, Nov 26, 2013 at 5:32 AM, Anastasis Andronidis andronat_...@hotmail.com wrote: I am interested for the Graph API if you want. Anastasis On 26 Νοε 2013, at 8:02 π.μ., Edward J. Yoon edwardy...@apache.org wrote: Anyone? On Thu, Nov 21, 2013 at 8:34 PM, Edward J. Yoon edwardy...@apache.org wrote: Hi forks, I talked little with Manning’s Publisher, and started to writing a book proposal. Comment below if you're interested in being co-author. -- Best Regards, Edward J. Yoon @eddieyoon -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Combine all Writables into a new package
+1 for hama-io or hama-commons On 21 October 2013 21:35, Tommaso Teofili tommaso.teof...@gmail.com wrote: what about creating a module for that (Writables and InputFormats for now) hama-io / hama-commons that can be used by both (containing math stuff as well) ? Tommaso 2013/10/21 Martin Illecker millec...@apache.org VectorWritable and MatrixWritable have both some dependencies to org.apache.hama.ml.math. (DenseDoubleVector, DoubleVector and DenseDoubleMatrix, DoubleMatrix) If we move VectorWritable and MatrixWritable to core (e.g., org.apache.hama.io.writable), we have to move org.apache.hama.ml.math as well. I think that's not possible because of other classes in hama-ml depending on ml.math. Temporary I will have to copy VectorWritable to the core to use it in a test case. 2013/10/21 Tommaso Teofili tommaso.teof...@gmail.com 2013/10/21 Martin Illecker millec...@apache.org Hello, regarding to my Hama Pipes test case [1], I want to use VectorWritable inside the hama-core module. Therefore I would need a dependency to hama-ml but this will cause a cyclic dependency. So is it possible to move both writables, VectorWritable and MatrixWritable, from org.apache.hama.ml.writable into a new package? e.g., org.apache.hama.io.writable based on [2] I think this really makes sense. Regarding to [3] we can also move TextArrayWritable from org.apache.hama.bsp into this new package. Do you think we can move the writables of org.apache.hama.ml.writable to the core module? +1 And can we do the package refactoring [2] of org.apache.hama.bsp submitted by Suraj? +1 here too. Tommaso Thanks! Martin [1] https://issues.apache.org/jira/browse/HAMA-808 [2] https://issues.apache.org/jira/secure/attachment/12609417/bsplist.txt [3] https://issues.apache.org/jira/browse/HAMA-727
Re: TaskStatus question
Yes. In addition to the scheduled state, if any explain on other states is also appreciated. Thanks On 29 September 2013 19:33, Suraj Menon surajsme...@apache.org wrote: Are you talking about a SCHEDULED state where the tast is not started yet but a directive is sent to Groomserver to start the task? -Suraj On Sat, Sep 28, 2013 at 7:49 AM, Chia-Hung Lin cli...@googlemail.comwrote: TaskStatus has following phase STARTING, COMPUTE, BARRIER_SYNC, CLEANUP, RECOVERING and state RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN, FAULT_NOTIFIED, RECOVERY_SCHEDULING, RECOVERY_SCHEDULED, RECOVERING What is the valid mapping or the state transition during a task is executed in a GroomServer? Thanks
TaskStatus question
TaskStatus has following phase STARTING, COMPUTE, BARRIER_SYNC, CLEANUP, RECOVERING and state RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN, FAULT_NOTIFIED, RECOVERY_SCHEDULING, RECOVERY_SCHEDULED, RECOVERING What is the valid mapping or the state transition during a task is executed in a GroomServer? Thanks
Re: svn commit: r1523425 - in /hama/trunk/c++: pom.xml src/main/native/pipes/impl/HamaPipes.cc
Apologize for committing the patch to the trunk. Looks like there is a screen time during which commit actions should be avoided. In addition to this, any other issue that may also affect the release? I would like to document this because it is a good example as a guideline so that when other members perform similar tasks, the same issue will not be repeated. Thanks for the explanation. On 17 September 2013 20:59, Tommaso Teofili tommaso.teof...@gmail.com wrote: I'll try to explain why IMHO it's usually better to not commit to trunk while voting release candidates (I realize now I did it myself some days ago too, sorry :) ). When the release manager runs the command 'mvn release:prepare' a bunch of things happen, one of them is the current trunk pom.xml files being moved to the next development iteration which for us is 0.7.0-SNAPSHOT therefore if the release vote doesn't pass and the release has to be rolled back, the pom.xml files have to be moved back to their previous versions (e.g. 0.6.2-SNAPSHOT) which is done by the release manager via the command 'mvn release:rollback' If someone has committed changes to the trunk this may cause the following: 1. running mvn release:rollback may fail due to incompatible SVN changes (to be merged manually) on pom files (this might be the case of the mentioned change on hama-pipes pom.xml) 2. committed change being silently rolled back and overwritten by 'mvn release:rollback' 3. a snapshot of hama-core 0.7.0-SNAPSHOT containing changes targeted for e.g. 0.6.3 being deployed on snapshot-repositories (not a big problem but still a bit not consistent) Given that I'm of course not against your commit, just it's possible that Edward's rollback command will overwrite it, so let's keep in mind we have to check that. Regards, Tommaso 2013/9/17 Chia-Hung Lin cli...@googlemail.com Any reason why this has to be rollback e.g. procedure, format, etc. because I would need this patch to be in? If it's procedure, format, etc., do we have guideline on wiki? Checking wiki such as jekins, HowTOCommit doesn't contain related information. Thanks On 17 September 2013 16:52, Tommaso Teofili tommaso.teof...@gmail.com wrote: ok, no problem, just let's not commit anything else before Edward con do the rollback. Tommaso 2013/9/17 Edward J. Yoon edwardy...@apache.org Sorry, I'm on vacation, will be back 2 days later. -- Best Regards, Edward J. Yoon @eddieyoon On 2013. 9. 17., at 오후 5:40, Tommaso Teofili tommaso.teof...@gmail.com wrote: I think we need Edward to run 'mvn release:rollback' as soon as possible (as latest vote has been canceled) and then commit this again. Tommaso 2013/9/15 chl...@apache.org Author: chl501 Date: Sun Sep 15 10:20:01 2013 New Revision: 1523425 URL: http://svn.apache.org/r1523425 Log: HAMA-802: Skip Hama Pipes native build when cmake is missing Modified: hama/trunk/c++/pom.xml hama/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc Modified: hama/trunk/c++/pom.xml URL: http://svn.apache.org/viewvc/hama/trunk/c%2B%2B/pom.xml?rev=1523425r1=1523424r2=1523425view=diff == --- hama/trunk/c++/pom.xml (original) +++ hama/trunk/c++/pom.xml Sun Sep 15 10:20:01 2013 @@ -31,7 +31,7 @@ descriptionApache Hama Pipes/description packagingpom/packaging -profiles +profiles profile idnative/id activation @@ -49,16 +49,32 @@ goalsgoalrun/goal/goals configuration target - mkdir dir=${project.build.directory}/native / - exec executable=cmake dir=${project.build.directory}/native failonerror=true -arg line=${basedir}/src/ -DJVM_ARCH_DATA_MODEL=${sun.arch.data.model} / - /exec - exec executable=make dir=${project.build.directory}/native failonerror=true -arg line=VERBOSE=1 / - /exec - !-- The second make is a workaround for HADOOP-9215. It can - be removed when version 2.6 of cmake is no longer supported . -- - exec executable=make dir=${project.build.directory}/native failonerror=true / + taskdef resource=net/sf/antcontrib/antcontrib.properties classpathref=maven.plugin.classpath / + !-- Check if cmake is installed -- + property environment=env / + if +or + available file=cmake filepath=${env.PATH} / + !-- on Windows it can be Path, path
Re: [VOTE] Hama 0.6.3 RC2
-1 Compilation fails with message thrown as below [exec] /usr/bin/c++-g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/tmp/0.6.3-RC2/c++/src/main/native/utils/api -I/tmp/0.6.3-RC2/c++/src/main/native/pipes/api -I/tmp/0.6.3-RC2/c++/src-o CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o -c /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc: In function ‘void* HamaPipes::ping(void*)’: [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16: error: ‘sleep’ was not declared in this scope [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30: error: ‘close’ was not declared in this scope On 14 September 2013 03:49, Anastasis Andronidis andronat_...@hotmail.com wrote: +1 Anastasis On 13 Σεπ 2013, at 9:47 π.μ., Edward J. Yoon edwardy...@apache.org wrote: Hi, I've created RC2 for Hama 0.6.3 release. Artifacts and Signatures: http://people.apache.org/~edwardyoon/dist/0.6.3-RC2/ SVN Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.3-RC2/ Please try it on both hadoop1 and hadoop2, run the tests, check the doc, etc. [ ] +1 Release the packages as Apache Hama 0.6.3 [ ] -1 Do not release the packages because... Thank you! -- Best Regards, Edward J. Yoon @eddieyoon
Re: [CANCELED][VOTE] Hama 0.6.3 RC2
In addition to that, I also encounter a compilation error. I can help fix this, but just not very sure if that's the right way to do it. If there is a jira created for these two fixes, I can provide a patch for compilation issue; and it would be good if anyone can help review it. [exec] /usr/bin/c++-g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/tmp/0.6.3-RC2/c++/src/main/native/utils/api -I/tmp/0.6.3-RC2/c++/src/main/native/pipes/api -I/tmp/0.6.3-RC2/c++/src-o CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o -c /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc: In function ‘void* HamaPipes::ping(void*)’: [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16: error: ‘sleep’ was not declared in this scope [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30: error: ‘close’ was not declared in this scope On 14 September 2013 12:06, Edward J. Yoon edwardy...@apache.org wrote: Oh,... sorry. [ERROR] https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/ws/trunk/yarn/src/main/java/org/apache/hama/bsp/BSPApplicationMaster.java:[64,7] error: BSPApplicationMaster is not abstract and does not override abstract method getAssignedPortNum(TaskAttemptID) in BSPPeerProtocol I'll create new one after fix above problem (next Sunday). On Sat, Sep 14, 2013 at 4:49 AM, Anastasis Andronidis andronat_...@hotmail.com wrote: +1 Anastasis On 13 Σεπ 2013, at 9:47 π.μ., Edward J. Yoon edwardy...@apache.org wrote: Hi, I've created RC2 for Hama 0.6.3 release. Artifacts and Signatures: http://people.apache.org/~edwardyoon/dist/0.6.3-RC2/ SVN Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.3-RC2/ Please try it on both hadoop1 and hadoop2, run the tests, check the doc, etc. [ ] +1 Release the packages as Apache Hama 0.6.3 [ ] -1 Do not release the packages because... Thank you! -- Best Regards, Edward J. Yoon @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
Re: [CANCELED][VOTE] Hama 0.6.3 RC2
Just notice that HAMA-802 should fix c++ compilation error. I've applied the patch to the trunk, it passes compilation and testing. On 15 September 2013 16:04, Chia-Hung Lin cli...@googlemail.com wrote: In addition to that, I also encounter a compilation error. I can help fix this, but just not very sure if that's the right way to do it. If there is a jira created for these two fixes, I can provide a patch for compilation issue; and it would be good if anyone can help review it. [exec] /usr/bin/c++-g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/tmp/0.6.3-RC2/c++/src/main/native/utils/api -I/tmp/0.6.3-RC2/c++/src/main/native/pipes/api -I/tmp/0.6.3-RC2/c++/src-o CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o -c /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc: In function ‘void* HamaPipes::ping(void*)’: [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16: error: ‘sleep’ was not declared in this scope [exec] /tmp/0.6.3-RC2/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30: error: ‘close’ was not declared in this scope On 14 September 2013 12:06, Edward J. Yoon edwardy...@apache.org wrote: Oh,... sorry. [ERROR] https://builds.apache.org/job/Hama-Nightly-for-Hadoop-2.x/ws/trunk/yarn/src/main/java/org/apache/hama/bsp/BSPApplicationMaster.java:[64,7] error: BSPApplicationMaster is not abstract and does not override abstract method getAssignedPortNum(TaskAttemptID) in BSPPeerProtocol I'll create new one after fix above problem (next Sunday). On Sat, Sep 14, 2013 at 4:49 AM, Anastasis Andronidis andronat_...@hotmail.com wrote: +1 Anastasis On 13 Σεπ 2013, at 9:47 π.μ., Edward J. Yoon edwardy...@apache.org wrote: Hi, I've created RC2 for Hama 0.6.3 release. Artifacts and Signatures: http://people.apache.org/~edwardyoon/dist/0.6.3-RC2/ SVN Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.3-RC2/ Please try it on both hadoop1 and hadoop2, run the tests, check the doc, etc. [ ] +1 Release the packages as Apache Hama 0.6.3 [ ] -1 Do not release the packages because... Thank you! -- Best Regards, Edward J. Yoon @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
Re: hama pipes build
If that's related to the error with mail titled - hama-pipes: An Ant BuildException during `make' Then 1. install cmake 2. an inclusion of `#include unistd.h` to the HamaPipes.cc is needed. On 9 September 2013 15:39, Anastasis Andronidis andronat_...@hotmail.com wrote: Hello, as Martin Illecker explains at his ticket (HAMA-749), there are lots of benefits with cmake: https://issues.apache.org/jira/browse/HADOOP-8368 I had the some confusion when I tried to build. I think we should test if cmake exists at first place in the beginning of the building, and inform the user to install it, instead of failing with a bogus message. Cheers, Anastasis On 9 Σεπ 2013, at 10:25 π.μ., Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, when I try to build from trunk I get the following while trying to build the pipes module: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hama-pipes: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program cmake (in directory /Users/teofili/Documents/workspaces/asf/hama/trunk/c++/target/native): error=2, No such file or directory - [Help 1] If I understand things correctly I miss the compiler on my computer, however it'd be good if we could skip that module if that cmake program is missing. Regards, Tommaso
hama-pipes: An Ant BuildException during `make'
When compiling trunk, following errors are thrown. How can I fix this problem? Environment: debian, 3.10-2-rt-686-pae, g++ 4.7.3, make 3.8.1 Thanks ... [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc: In function ‘void* HamaPipes::ping(void*)’: [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16: error: ‘sleep’ was not declared in this scope [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30: error: ‘close’ was not declared in this scope [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc: In function ‘bool HamaPipes::runTask(const HamaPipes::Factory)’: [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1280:28: error: ‘close’ was not declared in this scope [exec] make[2]: *** [CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o] Error 1 [exec] make[1]: *** [CMakeFiles/hamapipes.dir/all] Error 2 [exec] make: *** [all] Error 2 [exec] make[2]: Leaving directory `/path/to/trunk/c++/target/native' [exec] make[1]: Leaving directory `/path/to/trunk/c++/target/native' ... org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hama-pipes: An Ant BuildException has occured: exec returned: 2
Re: hama-pipes: An Ant BuildException during `make'
Just notice that unistd.h is not included in HamaPipes.cc. Adding unistd.h to HamaPipes.cc solves the problem, but not sure if this is the right way to fix it. Environment: debian 3.10-2-rt-686-pae cmake version 2.8.11.2 gcc/ g++ version 4.7.3 make version 3.81 On 8 September 2013 17:39, Martin Illecker mar...@illecker.at wrote: Do you have *cmake* installed? 2013/9/8 Chia-Hung Lin cli...@googlemail.com When compiling trunk, following errors are thrown. How can I fix this problem? Environment: debian, 3.10-2-rt-686-pae, g++ 4.7.3, make 3.8.1 Thanks ... [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc: In function ‘void* HamaPipes::ping(void*)’: [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1148:16: error: ‘sleep’ was not declared in this scope [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1167:30: error: ‘close’ was not declared in this scope [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc: In function ‘bool HamaPipes::runTask(const HamaPipes::Factory)’: [exec] /path/to/trunk/c++/src/main/native/pipes/impl/HamaPipes.cc:1280:28: error: ‘close’ was not declared in this scope [exec] make[2]: *** [CMakeFiles/hamapipes.dir/main/native/pipes/impl/HamaPipes.cc.o] Error 1 [exec] make[1]: *** [CMakeFiles/hamapipes.dir/all] Error 2 [exec] make: *** [all] Error 2 [exec] make[2]: Leaving directory `/path/to/trunk/c++/target/native' [exec] make[1]: Leaving directory `/path/to/trunk/c++/target/native' ... org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hama-pipes: An Ant BuildException has occured: exec returned: 2
Re: [DISCUSS] Hama 0.7.0
+1 BTW, are we going to prioritize tasks in roadmap? On 28 August 2013 14:17, Tommaso Teofili tommaso.teof...@gmail.com wrote: sure, it looks reasonable to me. Tommaso 2013/8/28 Edward J. Yoon edwardy...@apache.org Hi all, After we release the 0.6.3 (HDFS 2.0 version), we have to work for 0.7.0 version now. I would like to suggest that we solve the messaging scalability issue. WDYT? ... And, according to my experiments, BSP framework shows very nice performance (I tested also GraphLab and Spark). Only Graph job is slow. So, I'll mainly work on improving the performance of GraphJobRunner. -- Best Regards, Edward J. Yoon @eddieyoon
Re: HybridBSP (CPU and GPU) Task Integration
Sorry replying late. For scheduling, sorry I can't remember the detail now. IIRC scheduling is done by dispatching tasks to all GroomServers. The code in SimpleTaskScheduler.java has a class TaskWorker which dispatches GroomServerAction to GroomServer through GroomProtocol. At GroomServer side, a task is launched by launchTaskForJob() where it calls TaskInProgress's launchTask(); then a seperated process will be forked for running bsp logic. For input logic, other developers can provide more insight on how the data is split. My understanding is that the split mechanism resembles to MapReduce's as you mentioned by number of tasks; so you might need to provide input logic that can split corresponded data (80% for gpu 20% for cpu) and the tasks launched can correctly read those input data according to their type (bsp or gpu). I could be wrong regarding to input logic mechanism. The split logic seems to be nearby BSPJobClient.partition() function. What I can see at the moment is - if launched gpu tasks do not arbitrary talk (in direct way) to other external gpu processes, which runs wthin other bsp tasks, it seems that we can treat it as a normal bsp task without too much modification. But that would be subjected to how the implementation is done. On 25 August 2013 22:08, Martin Illecker millec...@apache.org wrote: Thanks, your picture [1] illustrates this scenario very good! In short I have to modify the runBSP in BSPTask, check if the submitted task extends HybridBSP. If so, start a PipesBSP server and wait for incoming connections. And run the bspGpu method within the HybridBSP task. Regarding to scheduling: 1) I have to decide within the runBSP should I execute the bspGpu or default bsp method of HybridBSP. e.g., having numTaskBsp set to 8, Hama will start 8 separate Java threads If I set an additional conf numTaskBspGpu property to 1, I want to have 9 bsp tasks. (I don't know where these bsp threads are started. Add property check for numTaskBspGpu and start more bsp tasks.) 8 tasks should execute the default bsp method within runBSP and only one task should run bspGpu. 2) It should be possible to schedule input data for bsp tasks. (belongs to the partitioning job) e.g, having 8 cpu bsp tasks and 1 gpu bsp task, I wish to have a property to control which amount of input belongs to which task. Default: Hama's partinioning job will divide the input data (e.g., sequence file) by the number of tasks? It might happen that e.g., 80% of input data should go to gpu task and only 20% to cpu tasks. By the way do you think a HybridBSP based task which extends BSP will work on Hama without any changes. Normally it should work because of inheritance of BSP. Thanks! Martin [1] http://i.imgur.com/RP3ETBW.png 2013/8/24 Chia-Hung Lin cli...@googlemail.com It seems to me that an additional process or thread will be launched for running a GPU-based bsp task, which will then communicate with PipesBSP process, as [1]. Please correct me if it is wrong. If this is the case, BSPTask looks like the place to work on. When BSPTask process is running, it can check (e.g. in runBSP) if additional GPU process/ thread is needed to be created; then launch/ destroy such task accordingly. By the way, it is mentioned that scheduling is needed. Can you please give a bit more detail on what kind of scheduling is required? [1]. http://i.imgur.com/RP3ETBW.png On 24 August 2013 00:59, Martin Illecker mar...@illecker.at wrote: What's the difference between launching `bsp task' and `gpu bsp task'? Will gpu bsp task fork and execute c/ c++ process? The GPU bsp task can also be executed within a Java process. In detail I want to run a Rootbeer Kernel (e.g., PiEstimationKernel [1]) within the bspGpu method. A Rootbeer Kernel is written in Java and converted to CUDA. (the entry point is the gpuMethod) Finally there is a Java wrapper around the CUDA code, so it can be invoked within the JVM. So far there is no difference between a normal bsp task execution but I want to use Hama Pipes to communicate via sockets. The GPU bsp task should start like the default one but I will have to establish the Pipes Server for communication. And of course I need scheduling for theses GPU and CPU tasks. I hope the following source will illustrate my scenario better: public class MyHybridBSP extends HybridBSPNullWritable, NullWritable, NullWritable, NullWritable, Text { @Override public void bsp(BSPPeerNullWritable, NullWritable, NullWritable, NullWritable, Text peer) throws IOException, SyncException, InterruptedException { MyGPUKernel kernel = new MyGPUKernel(); Rootbeer rootbeer = new Rootbeer(); rootbeer.setThreadConfig(BLOCK_SIZE, GRID_SIZE, BLOCK_SIZE*GRID_SIZE); // Run GPU Kernels rootbeer.runAll(kernel); } @Override public void bspGpu(BSPPeerNullWritable
ML test fails
When testing with the latest source obtained from svn (r1514736), ml test seems to fail. Is any setting is required? Just to check in case it's environment specific issue. --- T E S T S --- Running org.apache.hama.ml.ann.TestSmallLayeredNeuralNetwork 13/08/24 17:38:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/08/24 17:38:29 INFO mortbay.log: Training time: 9.488000s 13/08/24 17:38:29 INFO mortbay.log: Relative error: 20.00% 13/08/24 17:38:20 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1 13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1 13/08/24 17:38:20 WARN bsp.BSPJobClient: No job jar file set. User classes may not be found. See BSPJob#setJar(String) or check Your jar file. 13/08/24 17:38:20 INFO bsp.BSPJobClient: Running job: job_localrunner_0001 13/08/24 17:38:20 INFO bsp.LocalBSPRunner: Setting up a new barrier for 1 tasks! 13/08/24 17:38:23 INFO bsp.BSPJobClient: Current supersteps number: 1 13/08/24 17:38:23 INFO bsp.BSPJobClient: The total number of supersteps: 1 13/08/24 17:38:23 INFO bsp.BSPJobClient: Counters: 6 13/08/24 17:38:23 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter 13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEPS=1 13/08/24 17:38:23 INFO bsp.BSPJobClient: LAUNCHED_TASKS=1 13/08/24 17:38:23 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter 13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2 13/08/24 17:38:23 INFO bsp.BSPJobClient: IO_BYTES_READ=57416 13/08/24 17:38:23 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=0 13/08/24 17:38:23 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=618 13/08/24 17:38:23 INFO bsp.FileInputFormat: Total input paths to process : 5 13/08/24 17:38:23 WARN bsp.BSPJobClient: No job jar file set. User classes may not be found. See BSPJob#setJar(String) or check Your jar file. 13/08/24 17:38:23 INFO bsp.BSPJobClient: Running job: job_localrunner_0001 13/08/24 17:38:23 INFO bsp.LocalBSPRunner: Setting up a new barrier for 5 tasks! 13/08/24 17:38:23 INFO mortbay.log: Begin to train 13/08/24 17:38:26 INFO bsp.BSPJobClient: Current supersteps number: 375 13/08/24 17:38:29 INFO bsp.BSPJobClient: Current supersteps number: 849 13/08/24 17:38:32 INFO bsp.BSPJobClient: Current supersteps number: 1377 13/08/24 17:38:35 INFO bsp.BSPJobClient: Current supersteps number: 1877 13/08/24 17:38:38 INFO bsp.BSPJobClient: Current supersteps number: 2435 13/08/24 17:38:41 INFO bsp.BSPJobClient: Current supersteps number: 3001 13/08/24 17:38:44 INFO bsp.BSPJobClient: Current supersteps number: 3573 13/08/24 17:38:46 INFO mortbay.log: End of training, number of iterations: 2001. 13/08/24 17:38:46 INFO mortbay.log: Write model back to /tmp/distributed-model 13/08/24 17:38:47 INFO bsp.BSPJobClient: Current supersteps number: 3999 13/08/24 17:38:47 INFO bsp.BSPJobClient: The total number of supersteps: 3999 13/08/24 17:38:47 INFO bsp.BSPJobClient: Counters: 8 13/08/24 17:38:47 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter 13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEPS=3999 13/08/24 17:38:47 INFO bsp.BSPJobClient: LAUNCHED_TASKS=5 13/08/24 17:38:47 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter 13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2 13/08/24 17:38:47 INFO bsp.BSPJobClient: IO_BYTES_READ=278427240 13/08/24 17:38:47 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=39611 13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=2 13/08/24 17:38:47 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=300 13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=2 13/08/24 17:38:47 INFO mortbay.log: Reload model from /tmp/distributed-model. 13/08/24 17:38:47 INFO mortbay.log: Training time: 27.49s 13/08/24 17:38:47 INFO mortbay.log: Relative error: 24.67% Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec FAILURE!
Re: HybridBSP (CPU and GPU) Task Integration
It seems to me that an additional process or thread will be launched for running a GPU-based bsp task, which will then communicate with PipesBSP process, as [1]. Please correct me if it is wrong. If this is the case, BSPTask looks like the place to work on. When BSPTask process is running, it can check (e.g. in runBSP) if additional GPU process/ thread is needed to be created; then launch/ destroy such task accordingly. By the way, it is mentioned that scheduling is needed. Can you please give a bit more detail on what kind of scheduling is required? [1]. http://i.imgur.com/RP3ETBW.png On 24 August 2013 00:59, Martin Illecker mar...@illecker.at wrote: What's the difference between launching `bsp task' and `gpu bsp task'? Will gpu bsp task fork and execute c/ c++ process? The GPU bsp task can also be executed within a Java process. In detail I want to run a Rootbeer Kernel (e.g., PiEstimationKernel [1]) within the bspGpu method. A Rootbeer Kernel is written in Java and converted to CUDA. (the entry point is the gpuMethod) Finally there is a Java wrapper around the CUDA code, so it can be invoked within the JVM. So far there is no difference between a normal bsp task execution but I want to use Hama Pipes to communicate via sockets. The GPU bsp task should start like the default one but I will have to establish the Pipes Server for communication. And of course I need scheduling for theses GPU and CPU tasks. I hope the following source will illustrate my scenario better: public class MyHybridBSP extends HybridBSPNullWritable, NullWritable, NullWritable, NullWritable, Text { @Override public void bsp(BSPPeerNullWritable, NullWritable, NullWritable, NullWritable, Text peer) throws IOException, SyncException, InterruptedException { MyGPUKernel kernel = new MyGPUKernel(); Rootbeer rootbeer = new Rootbeer(); rootbeer.setThreadConfig(BLOCK_SIZE, GRID_SIZE, BLOCK_SIZE*GRID_SIZE); // Run GPU Kernels rootbeer.runAll(kernel); } @Override public void bspGpu(BSPPeerNullWritable, NullWritable, NullWritable, NullWritable, Text peer) throws IOException, SyncException, InterruptedException { // process algorithm on CPU } class MyGPUKernel implements Kernel { public PiEstimatorKernel() { } public void gpuMethod() { // process algorithm on GPU // the following commands will need Hama Pipes HamaPeer.getConfiguration(); HamaPeer.readNext(...,...); // and others } } Thanks! Martin [1] https://github.com/millecker/applications/blob/master/hama/rootbeer/piestimator/src/at/illecker/hama/rootbeer/examples/piestimator/gpu/PiEstimatorKernel.java 2013/8/23 Chia-Hung Lin cli...@googlemail.com What's the difference between launching `bsp task' and `gpu bsp task'? Will gpu bsp task fork and execute c/ c++ process? It might be good to distinguish how gpu bsp task will be executed, then deciding how to launch such task. Basically for launching a bsp task, an external process is created. The logic to execute BSP.bsp() is at BSPTask.java where the method runBSP() is called with a BSP implementation class loaded at runtime Class? workClass = job.getConfiguration().getClass(bsp.work.class, BSP.class); and then the bsp method is executed bsp.bsp(bspPeer); On 23 August 2013 21:45, Martin Illecker mar...@illecker.at wrote: Hi, I have created a HybridBSP [1] class which should combine the default BSP (CPU) class with GPU methods [2]. The abstract HybridBSP class extends the BSP class and adds bspGpu, setupGpu and cleanupGpu method. public abstract class HybridBSPK1, V1, K2, V2, M extends Writable extends BSPK1, V1, K2, V2, M implements BSPGpuInterfaceK1, V1, K2, V2, M { @Override public abstract void bspGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException, SyncException, InterruptedException; @Override public void setupGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException, SyncException, InterruptedException { } @Override public void cleanupGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException { } } Now I want to add a new scheduling technique which checks the conf property (gpuBspTaskNum) and executes the bspGpu instead of default bsp method. e.g., bspTaskNum=3 and gpuBspTaskNum=1 The scheduler should run four bsp tasks simultaneously and execute three times the bsp method and once the bspGpu. (both defined within one derived HybridBSP class) Do I have to modify the taskrunner or create a new SimpleTaskScheduler? How can I integrate this into Hama? Thanks! Martin [1] https://github.com/millecker/hama/blob/5d0e8b26abd6b63fa5afad09a2ba960bf9922868/core/src/main/java/org/apache/hama/bsp/gpu/HybridBSP.java [2] https://github.com/millecker/hama/blob
ML test case fails
When testing with the latest source obtained from svn (r1514736), ml test seems to fail. Is any setting is required? Just to check in case it's environment specific issue. --- T E S T S --- Running org.apache.hama.ml.ann.TestSmallLayeredNeuralNetwork 13/08/24 17:38:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/08/24 17:38:29 INFO mortbay.log: Training time: 9.488000s 13/08/24 17:38:29 INFO mortbay.log: Relative error: 20.00% 13/08/24 17:38:20 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1 13/08/24 17:38:20 INFO bsp.FileInputFormat: Total input paths to process : 1 13/08/24 17:38:20 WARN bsp.BSPJobClient: No job jar file set. User classes may not be found. See BSPJob#setJar(String) or check Your jar file. 13/08/24 17:38:20 INFO bsp.BSPJobClient: Running job: job_localrunner_0001 13/08/24 17:38:20 INFO bsp.LocalBSPRunner: Setting up a new barrier for 1 tasks! 13/08/24 17:38:23 INFO bsp.BSPJobClient: Current supersteps number: 1 13/08/24 17:38:23 INFO bsp.BSPJobClient: The total number of supersteps: 1 13/08/24 17:38:23 INFO bsp.BSPJobClient: Counters: 6 13/08/24 17:38:23 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter 13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEPS=1 13/08/24 17:38:23 INFO bsp.BSPJobClient: LAUNCHED_TASKS=1 13/08/24 17:38:23 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter 13/08/24 17:38:23 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2 13/08/24 17:38:23 INFO bsp.BSPJobClient: IO_BYTES_READ=57416 13/08/24 17:38:23 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=0 13/08/24 17:38:23 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=618 13/08/24 17:38:23 INFO bsp.FileInputFormat: Total input paths to process : 5 13/08/24 17:38:23 WARN bsp.BSPJobClient: No job jar file set. User classes may not be found. See BSPJob#setJar(String) or check Your jar file. 13/08/24 17:38:23 INFO bsp.BSPJobClient: Running job: job_localrunner_0001 13/08/24 17:38:23 INFO bsp.LocalBSPRunner: Setting up a new barrier for 5 tasks! 13/08/24 17:38:23 INFO mortbay.log: Begin to train 13/08/24 17:38:26 INFO bsp.BSPJobClient: Current supersteps number: 375 13/08/24 17:38:29 INFO bsp.BSPJobClient: Current supersteps number: 849 13/08/24 17:38:32 INFO bsp.BSPJobClient: Current supersteps number: 1377 13/08/24 17:38:35 INFO bsp.BSPJobClient: Current supersteps number: 1877 13/08/24 17:38:38 INFO bsp.BSPJobClient: Current supersteps number: 2435 13/08/24 17:38:41 INFO bsp.BSPJobClient: Current supersteps number: 3001 13/08/24 17:38:44 INFO bsp.BSPJobClient: Current supersteps number: 3573 13/08/24 17:38:46 INFO mortbay.log: End of training, number of iterations: 2001. 13/08/24 17:38:46 INFO mortbay.log: Write model back to /tmp/distributed-model 13/08/24 17:38:47 INFO bsp.BSPJobClient: Current supersteps number: 3999 13/08/24 17:38:47 INFO bsp.BSPJobClient: The total number of supersteps: 3999 13/08/24 17:38:47 INFO bsp.BSPJobClient: Counters: 8 13/08/24 17:38:47 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter 13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEPS=3999 13/08/24 17:38:47 INFO bsp.BSPJobClient: LAUNCHED_TASKS=5 13/08/24 17:38:47 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter 13/08/24 17:38:47 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2 13/08/24 17:38:47 INFO bsp.BSPJobClient: IO_BYTES_READ=278427240 13/08/24 17:38:47 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=39611 13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=2 13/08/24 17:38:47 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=300 13/08/24 17:38:47 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=2 13/08/24 17:38:47 INFO mortbay.log: Reload model from /tmp/distributed-model. 13/08/24 17:38:47 INFO mortbay.log: Training time: 27.49s 13/08/24 17:38:47 INFO mortbay.log: Relative error: 24.67% Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec FAILURE!
Re: HybridBSP (CPU and GPU) Task Integration
What's the difference between launching `bsp task' and `gpu bsp task'? Will gpu bsp task fork and execute c/ c++ process? It might be good to distinguish how gpu bsp task will be executed, then deciding how to launch such task. Basically for launching a bsp task, an external process is created. The logic to execute BSP.bsp() is at BSPTask.java where the method runBSP() is called with a BSP implementation class loaded at runtime Class? workClass = job.getConfiguration().getClass(bsp.work.class, BSP.class); and then the bsp method is executed bsp.bsp(bspPeer); On 23 August 2013 21:45, Martin Illecker mar...@illecker.at wrote: Hi, I have created a HybridBSP [1] class which should combine the default BSP (CPU) class with GPU methods [2]. The abstract HybridBSP class extends the BSP class and adds bspGpu, setupGpu and cleanupGpu method. public abstract class HybridBSPK1, V1, K2, V2, M extends Writable extends BSPK1, V1, K2, V2, M implements BSPGpuInterfaceK1, V1, K2, V2, M { @Override public abstract void bspGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException, SyncException, InterruptedException; @Override public void setupGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException, SyncException, InterruptedException { } @Override public void cleanupGpu(BSPPeerK1, V1, K2, V2, M peer) throws IOException { } } Now I want to add a new scheduling technique which checks the conf property (gpuBspTaskNum) and executes the bspGpu instead of default bsp method. e.g., bspTaskNum=3 and gpuBspTaskNum=1 The scheduler should run four bsp tasks simultaneously and execute three times the bsp method and once the bspGpu. (both defined within one derived HybridBSP class) Do I have to modify the taskrunner or create a new SimpleTaskScheduler? How can I integrate this into Hama? Thanks! Martin [1] https://github.com/millecker/hama/blob/5d0e8b26abd6b63fa5afad09a2ba960bf9922868/core/src/main/java/org/apache/hama/bsp/gpu/HybridBSP.java [2] https://github.com/millecker/hama/blob/5d0e8b26abd6b63fa5afad09a2ba960bf9922868/core/src/main/java/org/apache/hama/bsp/gpu/BSPGpuInterface.java
Re: [VOTE] Skip minor release, and prepare 1.0
+0 Personally I would not go for 1.0 now though the release for 1.0 is ok for me. My reason is people may expect functions such as FT to be ready when it's in the version 1.0. Also it might be inevitably that people would compare MRv2, Giraph to Hama; and would think that MRv2, and Giraph would be more better/ stable than Hama because of FT, etc. regardless of differences between projects. On 17 August 2013 16:33, Edward J. Yoon edwardy...@apache.org wrote: Hi all, I was planning to cut a 0.6.3 release candidate (Hadoop 2.0 compatible version), however it seems the age of compete for the preoccupancy is past. So we don't need to hurry up now. Moreover, we are currently adding a lot of changes, and still need to be improved a lot. We knows what we should do exactly. Do you think we can skip minor release and prepare 1.0 now? -- Best Regards, Edward J. Yoon @eddieyoon
Re: Discussion about guideline
It seems that our release requirement fits into continuous integration where software is frequently released. In current workflow, though we can decide what to be included and released in the roadmap. The release may still be deferred even 90% of issues are solved because the rest 10% is still not yet accomplished. With continuous integration (IIRC) the software is released frequently when a new feature is resolved, resulting in the reduced release cycle, and that problems can be solved a bit easier because change is small. If this is the case, it might be good for us adapting to CI; and we can schedule what (relative small) patches to be released in order. On 6 August 2013 09:16, Edward J. Yoon edwardy...@apache.org wrote: I mean, we can move most current issues to 0.7 and start to comply with our new development-process. On Tue, Aug 6, 2013 at 10:07 AM, Edward J. Yoon edwardy...@apache.org wrote: I personally would like to cut a release 0.6.3 after solve the HAMA-789 and HAMA-717 issues. Because, there are people who want to run Hama cluster on Hadoop 2.0 environment like me. I think we can move rest current issues into 0.7 roadmap. As we know, the only critical issues in core BSP project, are now memory efficiency and FT system. And, BSP-based ML algorithm library, query language projects can began in earnest. On Tue, Aug 6, 2013 at 9:48 AM, Yexi Jiang yexiji...@gmail.com wrote: How about the current in-progress issues? 2013/8/5 Edward J. Yoon edwardy...@apache.org First, each release, or between different releases, would have tasks included. Among tasks there might have priority between those tasks or a task may block one or more tasks. So how do we determine priority of tasks or that between several releases? An naive thought is by voting; however, issues may not be clear for every participants. In that case, voting may defer more important tasks. I think we can follow current guide line. Every ideas for improvements, new features, and suggestions are recommended to be discussed in polite terms before implementation on the dev@ list, and then its decisions must be listed on our RoadMap page. In simple improvement or bug type issues, you can skip the discussion and report directly on JIRA. And then, we can cut a release according to the Roadmap. a.) when a patch is submitted, at least 2 reviewers should help review source code. b.) the patch creator should describe e.g. execution flow/ procedure in a higher/ conceptual level. Reviewers then can cooperate review parts of the code in patch (tool may help in this stage). Some review points such as (java) doc and test cases should be included. - Test cases Each patch should have test cases that at least capture the main logical flow. And the tests is recommended not to bound to external dependencies so that time spent on testing can be reduced. - Doc (Java doc or wiki) Class should at least describe what it is, or its main logic flow. Or at lease write down the mechanism in wiki. Method fields that is not self-explanatory would be good to have doc explaining its purpose or its execution mechanism. +1 On Mon, Aug 5, 2013 at 11:33 PM, Chia-Hung Lin cli...@googlemail.com wrote: As hama community grows, it seems that it is good to have a guideline so that participants can follow and cooperate more smoothly. Therefore I would like to discuss about this, and please share your opinions so that we can improve the process. Below are some issues popping up on my head. - roadmap prioritization - development work flow First, each release, or between different releases, would have tasks included. Among tasks there might have priority between those tasks or a task may block one or more tasks. So how do we determine priority of tasks or that between several releases? An naive thought is by voting; however, issues may not be clear for every participants. In that case, voting may defer more important tasks. Second, a few subtopics are listed as below: - Code review Though a commit section is described, it is not clear how the procedure will be practised. My thought is a.) when a patch is submitted, at least 2 reviewers should help review source code. b.) the patch creator should describe e.g. execution flow/ procedure in a higher/ conceptual level. Reviewers then can cooperate review parts of the code in patch (tool may help in this stage). Some review points such as (java) doc and test cases should be included. - Test cases Each patch should have test cases that at least capture the main logical flow. And the tests is recommended not to bound to external dependencies so that time spent on testing can be reduced. - Doc (Java doc or wiki) Class should at least describe what it is, or its main logic flow. Or at lease write down the mechanism in wiki. Method fields that is not self-explanatory would
Discussion about guideline
As hama community grows, it seems that it is good to have a guideline so that participants can follow and cooperate more smoothly. Therefore I would like to discuss about this, and please share your opinions so that we can improve the process. Below are some issues popping up on my head. - roadmap prioritization - development work flow First, each release, or between different releases, would have tasks included. Among tasks there might have priority between those tasks or a task may block one or more tasks. So how do we determine priority of tasks or that between several releases? An naive thought is by voting; however, issues may not be clear for every participants. In that case, voting may defer more important tasks. Second, a few subtopics are listed as below: - Code review Though a commit section is described, it is not clear how the procedure will be practised. My thought is a.) when a patch is submitted, at least 2 reviewers should help review source code. b.) the patch creator should describe e.g. execution flow/ procedure in a higher/ conceptual level. Reviewers then can cooperate review parts of the code in patch (tool may help in this stage). Some review points such as (java) doc and test cases should be included. - Test cases Each patch should have test cases that at least capture the main logical flow. And the tests is recommended not to bound to external dependencies so that time spent on testing can be reduced. - Doc (Java doc or wiki) Class should at least describe what it is, or its main logic flow. Or at lease write down the mechanism in wiki. Method fields that is not self-explanatory would be good to have doc explaining its purpose or its execution mechanism. Just some ideas I have at the moment. Will add more if I find others. And we should keep improving it when necessary. Please add your points if you think some are missing, or remove some that is not needed. [1]. How to commit - Review https://wiki.apache.org/hama/HowToCommit#Review
Hama wiki
Seems can't edit wiki right now. Anyone can help check this issue. Thanks
Re: [DISCUSS] Roadmap for 0.7.0
I will now set - exporting more metrics - master notification tasks with higher priority. On 22 July 2013 02:32, Yexi Jiang yexiji...@gmail.com wrote: Hi, Tommaso, For the machine learning module, what kind of refactoring do you think is necessary? Regards, Yexi 2013/7/21 Edward J. Yoon edwardy...@apache.org Additionally, Queue is also one of big issues. On Sun, Jul 21, 2013 at 8:55 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi Edward, I'm still quite unsure about the status of FT so it may be worth doing some work to make sure that is fully working (but it may be just me). Also vertex storage in graph package should be improved. Then I'd say some refactoring of machine learning module APIs together with addition of Collaborative Filtering (and eventually some other algorithms, but I'm still unsure there). My 2 cents, Tommaso 2013/7/19 Edward J. Yoon edwardy...@apache.org Hi all, Once HAMA-742 is done, users will be able to install a Hama cluster on existing Hadoop 1.x and new Hadoop 2.x without issues. I think urgent tasks are finished, now it's time to discuss about the future roadmap Hama 0.7 and begin enhancement work. Please feel free to voice your opinions. Thanks. -- Best Regards, Edward J. Yoon @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Dynamic vertices and hama counters
Sorry my bad. Only focused on counter stuff. Didn't pay attention to Vertex related issue. Thought that just want to share counter value between peers. In that case persisting counter value to zk shouldn't be a problem, and won't incur overhead. But if the case is not about counter, please just ignore my previous post. On 17 July 2013 06:59, Edward J. Yoon edwardy...@apache.org wrote: You guys seems totally misunderstood what I am saying. Every BSP processor accesses to ZK's counter concurrently? Do you think it is possible to determine the current total number of vertices in every step without barrier synchronization? As I mentioned before, there is already additional barrier synchronization steps for aggregating and broadcasting global updated vertex count. You can use this steps without *no additional barrier synchronization*. On Wed, Jul 17, 2013 at 5:01 AM, andronat_asf andronat_...@hotmail.com wrote: Thank you everyone, +1 for Tommaso, I will see what I can do about that :) I also believe that ZK is very similar sync() mechanism that Edward is saying, but if we need to sync more info we might need ZK. Thanks again, Anastasis On 15 Ιουλ 2013, at 5:55 μ.μ., Edward J. Yoon edwardy...@apache.org wrote: andronat_asf, To aggregate and broadcast the global count of updated vertices, we calls sync() twice. See the doAggregationUpdates() method in GraphJobRunner. You can solve your problem the same way, and there will be no additional cost. Use of Zookeeper is not bad idea. But IMO, it's not much different with sync() mechanism. On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin cli...@googlemail.com wrote: +1 for Tommaso's solution. If not every algorithm needs counter service, having an interface with different implementations (in-memory, zk, etc.) should reduce the side effect. On 15 July 2013 15:51, Tommaso Teofili tommaso.teof...@gmail.com wrote: what about introducing a proper API for counting vertices, something like an interface VertexCounter with 2-3 implementations like InMemoryVertexCounter (basically the current one), a DistributedVertexCounter to implement the scenario where we use a separate BSP superstep to count them and a ZKVertexCounter which handles vertices counts as per Chian-Hung's suggestion. Also we may introduce something like a configuration variable to define if all the vertices are needed or just the neighbors (and/or some other strategy). My 2 cents, Tommaso 2013/7/14 Chia-Hung Lin cli...@googlemail.com Just my personal viewpoint. For small size of global information, considering to store the state in ZooKeeper might be a reasonable solution. On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote: Hello everyone, I'm working on HAMA-767 and I have some concerns on counters and scalability. Currently, every peer has a set of vertices and a variable that is keeping the total number of vertices through all peers. In my case, I'm trying to add and remove vertices during the runtime of a job, which means that I have to update all those variables. My problem is that this is not efficient because in every operation (add or remove a vertex) I need to update all peers, so I need to send lots of messages to make those updates (see GraphJobRunner#countGlobalVertexCount method) and I believe this is not correct and scalable. An other problem is that, even if I update all those variable (with the cost of sending lots of messages to every peer) those variables will be updated on the next superstep. e.g.: Peer 1:Peer 2: Vert_1 Vert_2 (Total_V = 2) (Total_V = 2) addVertex() (Total_V = 3) getNumberOfV() = 2 Sync getNumberOfV() = 3 Is there something like global counters or shared memory that it can address this issue? P.S. I have a small feeling that we don't need to track the total amount of vertices because vertex centered algorithms rarely need total numbers, they only depend on neighbors (I might be wrong though). Thanks, Anastasis -- Best Regards, Edward J. Yoon @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
Re: Dynamic vertices and hama counters
+1 for Tommaso's solution. If not every algorithm needs counter service, having an interface with different implementations (in-memory, zk, etc.) should reduce the side effect. On 15 July 2013 15:51, Tommaso Teofili tommaso.teof...@gmail.com wrote: what about introducing a proper API for counting vertices, something like an interface VertexCounter with 2-3 implementations like InMemoryVertexCounter (basically the current one), a DistributedVertexCounter to implement the scenario where we use a separate BSP superstep to count them and a ZKVertexCounter which handles vertices counts as per Chian-Hung's suggestion. Also we may introduce something like a configuration variable to define if all the vertices are needed or just the neighbors (and/or some other strategy). My 2 cents, Tommaso 2013/7/14 Chia-Hung Lin cli...@googlemail.com Just my personal viewpoint. For small size of global information, considering to store the state in ZooKeeper might be a reasonable solution. On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote: Hello everyone, I'm working on HAMA-767 and I have some concerns on counters and scalability. Currently, every peer has a set of vertices and a variable that is keeping the total number of vertices through all peers. In my case, I'm trying to add and remove vertices during the runtime of a job, which means that I have to update all those variables. My problem is that this is not efficient because in every operation (add or remove a vertex) I need to update all peers, so I need to send lots of messages to make those updates (see GraphJobRunner#countGlobalVertexCount method) and I believe this is not correct and scalable. An other problem is that, even if I update all those variable (with the cost of sending lots of messages to every peer) those variables will be updated on the next superstep. e.g.: Peer 1:Peer 2: Vert_1 Vert_2 (Total_V = 2) (Total_V = 2) addVertex() (Total_V = 3) getNumberOfV() = 2 Sync getNumberOfV() = 3 Is there something like global counters or shared memory that it can address this issue? P.S. I have a small feeling that we don't need to track the total amount of vertices because vertex centered algorithms rarely need total numbers, they only depend on neighbors (I might be wrong though). Thanks, Anastasis
Re: Display math to wiki
sciweavers[1] provides a tool converting tex equation to image. [1]. http://www.sciweavers.org/free-online-latex-equation-editor On 20 June 2013 20:28, Yexi Jiang yexiji...@gmail.com wrote: OK, I will use image instead. 2013/6/20 Edward J. Yoon edwardy...@apache.org You can post your request on Apache Infrastructure infrastruct...@apache.org. However, I would recommend just attaching/embedding images directly to the Wiki pages. It requires MathML enabled browser and fonts. On Thu, Jun 20, 2013 at 12:02 PM, Yexi Jiang yexiji...@gmail.com wrote: How to tell them about this? 2013/6/19 Edward edw...@udanax.org We should contact the ASF Infra team. Sent from my iPhone. Jun 20, 2013 10:16 AM Yexi Jiang yexiji...@gmail.com 작성: I found one, http://moinmo.in/MathMlSupport. But it seems that it requires to modify the files wikiconfig.py and wikiutil.py. 2013/6/19 Edward J. Yoon edwardy...@apache.org I just found this http://moinmo.in/FeatureRequests/MathExpression On Thu, Jun 20, 2013 at 9:32 AM, Yexi Jiang yexiji...@gmail.com wrote: Hi, Does anyone know how to input and display the math equation on the wiki page? It is better if the syntax is compatible to the latex. Regards, Yexi -- -- Best Regards, Edward J. Yoon @eddieyoon -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- Best Regards, Edward J. Yoon @eddieyoon -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
[VOTE] GIT migration
Hi, This is a formal vote for/ against migrating from svn to git regarding to Apache Hama's repository. +1 : Migrate to Git +0 : Abstain -1 : Stay on SVN
Re: [VOTE] Release Hama 0.6.1
I don't have free nodes at hand. Thus only test building from source. The build succeeds on my personal laptop. I am ok with that. +1 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hama parent POM SUCCESS [3.219s] [INFO] core .. SUCCESS [3:23.415s] [INFO] graph . SUCCESS [39.063s] [INFO] machine learning .. SUCCESS [6.820s] [INFO] examples .. SUCCESS [1:09.465s] [INFO] yarn .. SUCCESS [3.524s] [INFO] hama-dist . SUCCESS [7.358s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 5:33.308s [INFO] Finished at: Sun Mar 31 22:46:54 CST 2013 [INFO] Final Memory: 27M/309M [INFO] On 31 March 2013 12:22, Edward J. Yoon edwardy...@apache.org wrote: I've tested this RC on 4 nodes cluster. Everything looks good to me. +1 On Fri, Mar 29, 2013 at 6:00 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: +1 Tommaso 2013/3/29 Edward J. Yoon edwardy...@apache.org Hello all, I've created a Hama 0.6.1-RC1. As we discussed yesterday, this RC fixes input partitioning issue and adds few random data generators. Please check whether this is stable enough for newbies to start Hama on their cluster, and vote! Hama 0.6.1 RC1: http://people.apache.org/~edwardyoon/dist/0.6.1-RC1/ Tags: http://svn.apache.org/repos/asf/hama/tags/0.6.1-RC1/ Thanks. -- Best Regards, Edward J. Yoon @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
Re: Welcome Apurv Verma as new Apache Hama committer
Congratulation, Apurv! On 19 June 2012 23:05, Praveen Sripati praveensrip...@gmail.com wrote: Congrats Apurv - Praveen On Tue, Jun 19, 2012 at 8:14 PM, Tommaso Teofili tommaso.teof...@gmail.comwrote: Dear all, please join me in welcoming Apurv Verma as a new committer in the Apache Hama project. He's given valuable contributions to the project, he's the first new committer joining Hama as a TLP and we're happy he joined the team :-) Apurv, if you don't mind it'd be nice if you can spend a few words presenting yourself (it's an old Lucene tradition which I think would nice to pull here too). Welcome on board Apurv. Kind regards, Tommaso