[ 
https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011181#comment-13011181
 ] 

Alois Cochard edited comment on HAMA-359 at 3/25/11 12:47 PM:
--------------------------------------------------------------

>>BUT this could be a large disadvantage too, in the case if a groom is not 
>>running on a server where the data is actually stored.
>> In MapReduce this is called a non-local task, so you have to copy the data 
>> to the local datanode

Exactly what I was thinking when I was speaking about problem of data locality.

>> Using a GraphDB and an interface like Blueprints is like using a MySQL 
>> Database with JDBC inside a distributed environment. It is possible, but 
>> IMHO it is not optimal.

+1 Same conclusion here. I would say it's even an horrible abstraction 
inversion.

>>>>I looked at the graph-hbase, but it's just a graph-friendly API layer on 
>>>>top of HBase. It'll make you feel complex.

So ok, no added value at all. Will break the flexibility of storing the 
adjacency matrix the way you want.

To concluded it's seems more important to know *which* data structure to use, 
more than *where* to store it.

When you sure which structure to use (adjacency matrix i.e.) you can then 
choose the best system to store/access it (SequenceFile/HBase/...) and change 
it if necessary without impacting the algorithm.

Thanks !

      was (Author: alois.cochard):
    >>BUT this could be a large disadvantage too, in the case if a groom is not 
running on a server where the data is actually stored.
>> In MapReduce this is called a non-local task, so you have to copy the data 
>> to the local datanode

Exactly what I was thinking when I was speaking about problem of data locality.

>> Using a GraphDB and an interface like Blueprints is like using a MySQL 
>> Database with JDBC inside a distributed environment. It is possible, but 
>> IMHO it is not optimal.

+1 Same conclusion here. I would say it's even an horrible abstraction 
inversion.

>>>>I looked at the graph-hbase, but it's just a graph-friendly API layer on 
>>>>top of HBase. It'll make you feel complex.

So ok, no added value at all. Will break the flexibility of storing the 
adjacency matrix the way you want.

To concluded it's seems more important to know *which* data structure to use, 
more than *where* to store it.

When you sure which structure to use (adjacency list i.e.) you can then choose 
the best system to store/access it (SequenceFile/HBase/...) and change it if 
necessary without impacting the algorithm.

Thanks !
  
> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
>                 Key: HAMA-359
>                 URL: https://issues.apache.org/jira/browse/HAMA-359
>             Project: Hama
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>              Labels: gsoc, gsoc2011, mentor
>             Fix For: 0.3.0
>
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a 
> Shortest Path using Hama BSP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to