Announcement: Hama talk at Hadoop in Seoul 2013

2013-07-15 Thread Edward J. Yoon
Hi all,

I'll talk at Hadoop In Seoul 2013 about Apache Hama. See speakers at
http://hadoop.co.kr

I'm working on my slides[1]. If you have any suggestion, Pls let me know.

1. 
https://docs.google.com/presentation/d/1263QjLu8pgqcnrG2xNDf-SyVG-aR5k7-2naYB9gmzvg/edit?usp=sharing

Thanks.

-- 
Best Regards, Edward J. Yoon
@eddieyoon


[jira] [Commented] (HAMA-772) When selected KeyValueTextInputFormat, workers get only one value for key

2013-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709169#comment-13709169
 ] 

Hudson commented on HAMA-772:
-

SUCCESS: Integrated in Hama-Nightly-for-Hadoop-1.x #974 (See 
[https://builds.apache.org/job/Hama-Nightly-for-Hadoop-1.x/974/])
HAMA-772: When selected KeyValueTextInputFormat, workers get only one value for 
key (edwardyoon: rev 1503292)
* /hama/trunk/CHANGES.txt
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/PartitioningRunner.java
* 
/hama/trunk/core/src/test/java/org/apache/hama/bsp/TestKeyValueTextInputFormat.java
* /hama/trunk/graph/src/main/java/org/apache/hama/graph/VertexInputReader.java


> When selected KeyValueTextInputFormat, workers get only one value for key
> -
>
> Key: HAMA-772
> URL: https://issues.apache.org/jira/browse/HAMA-772
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.6.2
>Reporter: Ikhtiyor Ahmedov
>Assignee: Ikhtiyor Ahmedov
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HAMA-772_2.patch, HAMA-772.patch
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> When KeyValueTextInputFormat selected as job input format, worker tasks are 
> getting only one value for given key.
> Reason: In PartitioningRunner class for collecting data into memory used 
> Map>, where Integer for worker id, Map is 
> for key/values, if multiple values for same key, all values will be 
> overwritten by the last value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Request for Mentorship in HAMA-743

2013-07-15 Thread Sreejith Ramakrishnan
Thank you for the reply, Tommaso. I hope someone would offer help before
19th July which is when the application period ends :(

If I post a rough idea of how I'll tackle the project, would my odds be
better?


On Mon, Jul 15, 2013 at 1:09 PM, Tommaso Teofili
wrote:

> Hi Sreejith,
>
> it's really nice you're willing to contribute to the project, at the moment
> I'm not available for mentoring unfortunately.
> Can anyone else pick up such a role to help out?
>
> Thanks a lot for your effort,
> Regards,
> Tommaso
>
> 2013/7/14 Sreejith Ramakrishnan 
>
> > I recently attended a bootcamp for a Joint Mentoring Programme by Luciano
> > Resende (Community Development, Apache) - [
> > http://community.apache.org/mentoringprogramme-icfoss-pilot.html] .
> >
> > It is geared towards helping students get involved in open source
> projects.
> > Similar to GSoC.
> >
> > Being from a mapreduce background, I find BSP interesting and would love
> to
> > work with you. As per the programme, we should be allotted a mentor.
> >
> > I'm currently reading some papers and working on a plan and a proposed
> > schedule.
> >
> > Can somebody please be kind enough to be my mentor?  :)
> >
> > [1] https://issues.apache.org/jira/browse/HAMA-743
> >
>


[jira] [Commented] (HAMA-772) When selected KeyValueTextInputFormat, workers get only one value for key

2013-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708595#comment-13708595
 ] 

Hudson commented on HAMA-772:
-

SUCCESS: Integrated in Hama trunk #147 (See 
[https://builds.apache.org/job/Hama%20trunk/147/])
HAMA-772: When selected KeyValueTextInputFormat, workers get only one value for 
key (edwardyoon: rev 1503292)
* /hama/trunk/CHANGES.txt
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/PartitioningRunner.java
* 
/hama/trunk/core/src/test/java/org/apache/hama/bsp/TestKeyValueTextInputFormat.java
* /hama/trunk/graph/src/main/java/org/apache/hama/graph/VertexInputReader.java


> When selected KeyValueTextInputFormat, workers get only one value for key
> -
>
> Key: HAMA-772
> URL: https://issues.apache.org/jira/browse/HAMA-772
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.6.2
>Reporter: Ikhtiyor Ahmedov
>Assignee: Ikhtiyor Ahmedov
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HAMA-772_2.patch, HAMA-772.patch
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> When KeyValueTextInputFormat selected as job input format, worker tasks are 
> getting only one value for given key.
> Reason: In PartitioningRunner class for collecting data into memory used 
> Map>, where Integer for worker id, Map is 
> for key/values, if multiple values for same key, all values will be 
> overwritten by the last value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HAMA-767) [GSoC 2013] Vertex addition and removal

2013-07-15 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708545#comment-13708545
 ] 

Edward J. Yoon commented on HAMA-767:
-

As I mentioned in mailing list today, ..

Please see the globalUpdatedCount and doAggregationUpdates() method in 
GraphJobRunner. You can update the global count of vertices in the same way 
(Reducing the barrier synchronization times for efficiency can be another task. 
Let's consider it later).

> [GSoC 2013] Vertex addition and removal
> ---
>
> Key: HAMA-767
> URL: https://issues.apache.org/jira/browse/HAMA-767
> Project: Hama
>  Issue Type: New Feature
>  Components: examples, graph
>Affects Versions: 0.6.1
>Reporter: Anastasis Andronidis
>Assignee: Anastasis Andronidis
>  Labels: dynamic, graph, gsoc, gsoc2013, mentoring
> Fix For: 0.6.3
>
> Attachments: HAMA-767-addAndRemove-v1.patch, 
> HAMA-767-addition-v1.patch, HAMA-767-examplesAndTests-v1.patch, 
> HAMA-767-v2.patch
>
>
> Implement addVertex and removeVertex methods for incremental graph support on 
> Graph API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HAMA-772) When selected KeyValueTextInputFormat, workers get only one value for key

2013-07-15 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708541#comment-13708541
 ] 

Edward J. Yoon commented on HAMA-772:
-

+1 I've committed this, Thanks Ikhtiyor Ahmedov! :D

> When selected KeyValueTextInputFormat, workers get only one value for key
> -
>
> Key: HAMA-772
> URL: https://issues.apache.org/jira/browse/HAMA-772
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.6.2
>Reporter: Ikhtiyor Ahmedov
>Assignee: Ikhtiyor Ahmedov
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HAMA-772_2.patch, HAMA-772.patch
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> When KeyValueTextInputFormat selected as job input format, worker tasks are 
> getting only one value for given key.
> Reason: In PartitioningRunner class for collecting data into memory used 
> Map>, where Integer for worker id, Map is 
> for key/values, if multiple values for same key, all values will be 
> overwritten by the last value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HAMA-772) When selected KeyValueTextInputFormat, workers get only one value for key

2013-07-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon reassigned HAMA-772:
---

Assignee: Ikhtiyor Ahmedov

> When selected KeyValueTextInputFormat, workers get only one value for key
> -
>
> Key: HAMA-772
> URL: https://issues.apache.org/jira/browse/HAMA-772
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.6.2
>Reporter: Ikhtiyor Ahmedov
>Assignee: Ikhtiyor Ahmedov
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HAMA-772_2.patch, HAMA-772.patch
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> When KeyValueTextInputFormat selected as job input format, worker tasks are 
> getting only one value for given key.
> Reason: In PartitioningRunner class for collecting data into memory used 
> Map>, where Integer for worker id, Map is 
> for key/values, if multiple values for same key, all values will be 
> overwritten by the last value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HAMA-612) BSP-based online collaborative filtering algorithm

2013-07-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon reassigned HAMA-612:
---

Assignee: Ikhtiyor Ahmedov

> BSP-based online collaborative filtering algorithm
> --
>
> Key: HAMA-612
> URL: https://issues.apache.org/jira/browse/HAMA-612
> Project: Hama
>  Issue Type: New Feature
>  Components: machine learning
>Reporter: Edward J. Yoon
>Assignee: Ikhtiyor Ahmedov
>  Labels: gsoc, gsoc2013, mentor
> Fix For: 0.6.3
>
> Attachments: data_gen.py, HAMA-612_0.1.patch, ratings5K.dat
>
>
> I've finished implementation of online CF filtering algorithm, applied to 
> real world application and evaluated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Dynamic vertices and hama counters

2013-07-15 Thread Edward J. Yoon
andronat_asf,

To aggregate and broadcast the global count of updated vertices, we
calls sync() twice. See the doAggregationUpdates() method in
GraphJobRunner. You can solve your problem the same way, and there
will be no additional cost.

Use of Zookeeper is not bad idea. But IMO, it's not much different
with sync() mechanism.

On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin  wrote:
> +1 for Tommaso's solution.
>
> If not every algorithm needs counter service, having an interface with
> different implementations (in-memory, zk, etc.) should reduce the side
> effect.
>
>
> On 15 July 2013 15:51, Tommaso Teofili  wrote:
>> what about introducing a proper API for counting vertices, something like
>> an interface VertexCounter with 2-3 implementations like
>> InMemoryVertexCounter (basically the current one), a
>> DistributedVertexCounter to implement the scenario where we use a separate
>> BSP superstep to count them and a ZKVertexCounter which handles vertices
>> counts as per Chian-Hung's suggestion.
>>
>> Also we may introduce something like a configuration variable to define if
>> all the vertices are needed or just the neighbors (and/or some other
>> strategy).
>>
>> My 2 cents,
>> Tommaso
>>
>> 2013/7/14 Chia-Hung Lin 
>>
>>> Just my personal viewpoint. For small size of global information,
>>> considering to store the state in ZooKeeper might be a reasonable
>>> solution.
>>>
>>> On 13 July 2013 21:28, andronat_asf  wrote:
>>> > Hello everyone,
>>> >
>>> > I'm working on HAMA-767 and I have some concerns on counters and
>>> scalability. Currently, every peer has a set of vertices and a variable
>>> that is keeping the total number of vertices through all peers. In my case,
>>> I'm trying to add and remove vertices during the runtime of a job, which
>>> means that I have to update all those variables.
>>> >
>>> > My problem is that this is not efficient because in every operation (add
>>> or remove a vertex) I need to update all peers, so I need to send lots of
>>> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
>>> method) and I believe this is not correct and scalable. An other problem is
>>> that, even if I update all those variable (with the cost of sending lots of
>>> messages to every peer) those variables will be updated on the next
>>> superstep.
>>> >
>>> > e.g.:
>>> >
>>> > Peer 1:Peer 2:
>>> >   Vert_1  Vert_2
>>> > (Total_V = 2)  (Total_V = 2)
>>> > addVertex()
>>> > (Total_V = 3)
>>> >  getNumberOfV() => 2
>>> >
>>> >  Sync 
>>> >
>>> >  getNumberOfV() => 3
>>> >
>>> >
>>> > Is there something like global counters or shared memory that it can
>>> address this issue?
>>> >
>>> > P.S. I have a small feeling that we don't need to track the total amount
>>> of vertices because vertex centered algorithms rarely need total numbers,
>>> they only depend on neighbors (I might be wrong though).
>>> >
>>> > Thanks,
>>> > Anastasis
>>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon


Re: Dynamic vertices and hama counters

2013-07-15 Thread Chia-Hung Lin
+1 for Tommaso's solution.

If not every algorithm needs counter service, having an interface with
different implementations (in-memory, zk, etc.) should reduce the side
effect.


On 15 July 2013 15:51, Tommaso Teofili  wrote:
> what about introducing a proper API for counting vertices, something like
> an interface VertexCounter with 2-3 implementations like
> InMemoryVertexCounter (basically the current one), a
> DistributedVertexCounter to implement the scenario where we use a separate
> BSP superstep to count them and a ZKVertexCounter which handles vertices
> counts as per Chian-Hung's suggestion.
>
> Also we may introduce something like a configuration variable to define if
> all the vertices are needed or just the neighbors (and/or some other
> strategy).
>
> My 2 cents,
> Tommaso
>
> 2013/7/14 Chia-Hung Lin 
>
>> Just my personal viewpoint. For small size of global information,
>> considering to store the state in ZooKeeper might be a reasonable
>> solution.
>>
>> On 13 July 2013 21:28, andronat_asf  wrote:
>> > Hello everyone,
>> >
>> > I'm working on HAMA-767 and I have some concerns on counters and
>> scalability. Currently, every peer has a set of vertices and a variable
>> that is keeping the total number of vertices through all peers. In my case,
>> I'm trying to add and remove vertices during the runtime of a job, which
>> means that I have to update all those variables.
>> >
>> > My problem is that this is not efficient because in every operation (add
>> or remove a vertex) I need to update all peers, so I need to send lots of
>> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
>> method) and I believe this is not correct and scalable. An other problem is
>> that, even if I update all those variable (with the cost of sending lots of
>> messages to every peer) those variables will be updated on the next
>> superstep.
>> >
>> > e.g.:
>> >
>> > Peer 1:Peer 2:
>> >   Vert_1  Vert_2
>> > (Total_V = 2)  (Total_V = 2)
>> > addVertex()
>> > (Total_V = 3)
>> >  getNumberOfV() => 2
>> >
>> >  Sync 
>> >
>> >  getNumberOfV() => 3
>> >
>> >
>> > Is there something like global counters or shared memory that it can
>> address this issue?
>> >
>> > P.S. I have a small feeling that we don't need to track the total amount
>> of vertices because vertex centered algorithms rarely need total numbers,
>> they only depend on neighbors (I might be wrong though).
>> >
>> > Thanks,
>> > Anastasis
>>


Re: Dynamic vertices and hama counters

2013-07-15 Thread Tommaso Teofili
what about introducing a proper API for counting vertices, something like
an interface VertexCounter with 2-3 implementations like
InMemoryVertexCounter (basically the current one), a
DistributedVertexCounter to implement the scenario where we use a separate
BSP superstep to count them and a ZKVertexCounter which handles vertices
counts as per Chian-Hung's suggestion.

Also we may introduce something like a configuration variable to define if
all the vertices are needed or just the neighbors (and/or some other
strategy).

My 2 cents,
Tommaso

2013/7/14 Chia-Hung Lin 

> Just my personal viewpoint. For small size of global information,
> considering to store the state in ZooKeeper might be a reasonable
> solution.
>
> On 13 July 2013 21:28, andronat_asf  wrote:
> > Hello everyone,
> >
> > I'm working on HAMA-767 and I have some concerns on counters and
> scalability. Currently, every peer has a set of vertices and a variable
> that is keeping the total number of vertices through all peers. In my case,
> I'm trying to add and remove vertices during the runtime of a job, which
> means that I have to update all those variables.
> >
> > My problem is that this is not efficient because in every operation (add
> or remove a vertex) I need to update all peers, so I need to send lots of
> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
> method) and I believe this is not correct and scalable. An other problem is
> that, even if I update all those variable (with the cost of sending lots of
> messages to every peer) those variables will be updated on the next
> superstep.
> >
> > e.g.:
> >
> > Peer 1:Peer 2:
> >   Vert_1  Vert_2
> > (Total_V = 2)  (Total_V = 2)
> > addVertex()
> > (Total_V = 3)
> >  getNumberOfV() => 2
> >
> >  Sync 
> >
> >  getNumberOfV() => 3
> >
> >
> > Is there something like global counters or shared memory that it can
> address this issue?
> >
> > P.S. I have a small feeling that we don't need to track the total amount
> of vertices because vertex centered algorithms rarely need total numbers,
> they only depend on neighbors (I might be wrong though).
> >
> > Thanks,
> > Anastasis
>


Re: Request for Mentorship in HAMA-743

2013-07-15 Thread Tommaso Teofili
Hi Sreejith,

it's really nice you're willing to contribute to the project, at the moment
I'm not available for mentoring unfortunately.
Can anyone else pick up such a role to help out?

Thanks a lot for your effort,
Regards,
Tommaso

2013/7/14 Sreejith Ramakrishnan 

> I recently attended a bootcamp for a Joint Mentoring Programme by Luciano
> Resende (Community Development, Apache) - [
> http://community.apache.org/mentoringprogramme-icfoss-pilot.html] .
>
> It is geared towards helping students get involved in open source projects.
> Similar to GSoC.
>
> Being from a mapreduce background, I find BSP interesting and would love to
> work with you. As per the programme, we should be allotted a mentor.
>
> I'm currently reading some papers and working on a plan and a proposed
> schedule.
>
> Can somebody please be kind enough to be my mentor?  :)
>
> [1] https://issues.apache.org/jira/browse/HAMA-743
>