[jira] [Commented] (GIRAPH-170) Workflow for loading RDF graph data into Giraph

Paolo Castagna (Commented) (JIRA) Thu, 19 Apr 2012 12:11:07 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257695#comment-13257695
 ]


Paolo Castagna commented on GIRAPH-170:
---------------------------------------

Hi Benjamin

> I call this the RDFAdjacencyCSV

We came to the same conclusion. I ended up using Turtle for this, as explained 
here: 
http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201204.mbox/%3C4F84872E.4050101%40googlemail.com%3E

Turtle isn't splittable in general, but it can be made so simply writing all 
the RDF statements with the same subject on a single line.

> I would like to say that Paolos suggestion of providing some ready made code 
> for Pig, HBase and MapReduce for processing RDF sounds like a really great 
> contribution. 

I am not sure what's the best place to put such code, I started with sharing 
small examples and experiments on GitHub, here: 
https://github.com/castagna/jena-grande

> Integration of RDF reasoning capabilities: I will need to perform subclass 
> reasoning on the DBPedia graph.

See Apache Jena's RIOT infer command or a MapReduce version of it, here: 
https://github.com/castagna/tdbloader4/blob/master/src/main/java/org/apache/jena/tdbloader4/InferDriver.java

I wonder if Giraph could be used to implement the RETE algorithm 
(http://en.wikipedia.org/wiki/Rete_algorithm) which is what Jena uses (with in 
memory RDF Jena models).
                
> Workflow for loading RDF graph data into Giraph
> -----------------------------------------------
>
>                 Key: GIRAPH-170
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-170
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Dan Brickley
>            Priority: Minor
>
> W3C RDF provides a family of Web standards for exchanging graph-based data. 
> RDF uses sets of simple binary relationships, labeling nodes and links with 
> Web identifiers (URIs). Many public datasets are available as RDF, including 
> the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many 
> such datasets are listed at http://thedatahub.org/
> RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple 
> line-oriented format is N-Triples. A format aligned with RDF's SPARQL query 
> language is Turtle. Apache Jena and Any23 provide software to handle all 
> these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/
> This JIRA leaves open the strategy for loading RDF data into Giraph. There 
> are various possibilites, including exploitation of intermediate 
> Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a 
> more Giraph-friendly form, or writing custom loaders. Even a HOWTO document 
> or implementor notes here would be an advance on the current state of the 
> art. The BluePrints Graph API (Gremlin etc.) has also been aligned with 
> various RDF datasources.
> Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 
> touches on the issue (since we can't currently easily represent fully general 
> RDF graphs since two nodes might be connected by more than one typed edge). 
> Even without multigraphs it ought to be possible to bring RDF-sourced data
> into Giraph, e.g. perhaps some app is only interested in say the Movies + 
> People subset of a big RDF collection.
> From Avery in email: "a helper VertexInputFormat (and maybe 
> VertexOutputFormat) would certainly [despite GIRAPH-141] still help"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-170) Workflow for loading RDF graph data into Giraph

Reply via email to