[Hama Wiki] Update of "GraphPackage" by edwardyoon

Apache Wiki Tue, 17 Apr 2012 19:24:48 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change 
notification.


The "GraphPackage" page has been changed by edwardyoon:
http://wiki.apache.org/hama/GraphPackage?action=diff&rev1=6&rev2=7

+ Please update 
http://people.apache.org/~edwardyoon/site/hama_graph_tutorial.html
- = The Graph Package (Angrapa) =
- The graph package, called Angrapa, is an large-scale graph data management 
framework for analytical processing. It is still in heavy development. Angrapa 
will employ massive parallelism on Hadoop, and It aims to achieve the 
scalability for processing tera bytes or peta bytes graph data. Angrapa will be 
used in a variety of scientific and industrial areas, such as data mining, 
machine learning, information retrieval, bioinformatics, and social networks, 
required to process large-scale graph data.
  
- = The Main Goal =
-  * Easy APIs familiar to graph features
-  * Storing techniques and the data communication method (i.e., BSP) without 
deterioration of graph data locality
- 
- = An Overview of the Angrapa =
- The architecture of angrapa is similar to that of !MapReduce except it is 
founded on the [[BulkSynchronizationParallel| BSP]]. It consists of one master 
and many walkers. One master corresponds to a jobtracker of !MapReduce, and 
many walkers correspond to task trackers. Like !MapReduce, angrapa will be 
carried out on data sets on HDFS or HBase.
- 
- For processing, programs written in angrapa will be automatically 
parallelized and executed on walkers. The processing of angrapa consists of a 
sequence of parallel supersteps. Each superstep is also subdivided into three 
ordered phases, consisting of local computation in each walker; communications 
among the walkers, leading to transfers of intermediate data between walkers; 
and a barrier synchronization, waiting for all of the communications to 
complete.
- 
- In the local computation phase, each walker polls a vertex ''v1'' - 
initially, it is given by a user or the program, or it can be obtained from 
intermediate results transmitted from other walkers after the first superstep. 
- from a local queue, and it starts to traverse from ''v1'' on graph data that 
reside in a local storage. During this phase, walker enqueues additional 
vertices to be processed into the local queue. However, if some inserted vertex 
''v2'' do not reside in the local storage, the vertex ''v2'' is removed from 
the local queue and inserted to the global queue that keeps vertices to be 
transmitted into other walkers at the next communication phase.
- 
- In the communication phase, each walker communicates intermediate data about 
vertices that reside in local but are necessary to be computed with other 
vertices that reside in another walkers.
- 
- In the barrier synchronization phase, the master controls all of the walkers 
with zookeepers. If there are no intermediate data to be processed, the program 
will finish, and it is the end of one superstep. Otherwise, next superstep will 
start with intermediate data transmitted among walkers.
- 
- = Example 1 =
- We assume that given a large-scale graph data and a start vertex, we find all 
shortest paths from the start vertex to every vertices on the given graph data. 
In such a case, the program written in angrapa starts from the start vertex. 
During the local computation phase, each time walker dequeues a vertex from the 
local queue, each walker enqueues its adjacent vertices with paths from the 
start vertex to the current dequeued vertex. Then, an enqueued vertex is 
checked if it resides in the local partition. If not, it is picked to the 
global queue. (To be described in more detail)
- 
- = Example 2 =
- For example, we assume that given a subgraph ''s'', we find a set of 
subgraphs isomorphic to the given subgraph ''s''. In such a case, angrapa will 
be very desirable because each walker can independently begin with some start 
vertex matched to the subgraph ''s''. The program will needs only one superstep 
if the subgraph ''s'' is small enough to be within a partition. (To be 
described in more detail)
-

[Hama Wiki] Update of "GraphPackage" by edwardyoon

Reply via email to