Dan Brickley commented on GIRAPH-170:

Another architectural note around RDF:

RDF is basically simple factual data expressed as sets of binary relationships. 
In that sense it is a graph directly, already. 

However often RDF describes something that is in a deeper sense also a graph. 
Common examples include FOAF, where node and edge types (Person, Document, 
Group, etc.) can express matrix of collaboration, social linkage, etc. Or from 
DBpedia.org, Freebase etc., we have for example datasets of movies and actors. 
In the dbpedia case, it's simple enough; a movie node, an actor node, and a 
typed link between them. Freebase by contrast, reifies the 'starring' 
relationship into another node, ... so you can represent dates, character name 
etc. This sort of meta-information (properties of links) is also btw in the 
BluePrints/Gremlin API.

One point here is that a 'starring' link pointing from a Movie to an Actor, 
tells us the same, but in reverse, as what we would have learned from a 
'starsIn' link from the Actor to the Movie. For Giraph we may want to consider 
therefore adding backlinks so each node is equally aware of properties pointing 
both in, and out.

> Workflow for loading RDF graph data into Giraph
> -----------------------------------------------
>                 Key: GIRAPH-170
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-170
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Dan Brickley
>            Priority: Minor
> W3C RDF provides a family of Web standards for exchanging graph-based data. 
> RDF uses sets of simple binary relationships, labeling nodes and links with 
> Web identifiers (URIs). Many public datasets are available as RDF, including 
> the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many 
> such datasets are listed at http://thedatahub.org/
> RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple 
> line-oriented format is N-Triples. A format aligned with RDF's SPARQL query 
> language is Turtle. Apache Jena and Any23 provide software to handle all 
> these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/
> This JIRA leaves open the strategy for loading RDF data into Giraph. There 
> are various possibilites, including exploitation of intermediate 
> Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a 
> more Giraph-friendly form, or writing custom loaders. Even a HOWTO document 
> or implementor notes here would be an advance on the current state of the 
> art. The BluePrints Graph API (Gremlin etc.) has also been aligned with 
> various RDF datasources.
> Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 
> touches on the issue (since we can't currently easily represent fully general 
> RDF graphs since two nodes might be connected by more than one typed edge). 
> Even without multigraphs it ought to be possible to bring RDF-sourced data
> into Giraph, e.g. perhaps some app is only interested in say the Movies + 
> People subset of a big RDF collection.
> From Avery in email: "a helper VertexInputFormat (and maybe 
> VertexOutputFormat) would certainly [despite GIRAPH-141] still help"

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to