Hi Dan,
I have not an answer to your questions/observations yet.

However, I suspect N-Triples | N-Quads might not be the best option for
something like Giraph. Something more like an adjacency list might be

So, my intuition, is that if you start with RDF in N-Triples format,
the first step would be a simple MapReduce job to group RDF statements
by subject (eventually filtering out certain properties):


  s1 --p1--> o1
  s1 --p2--> o2
  s1 --p2--> o3
  s2 ...

Output (adjacency list):

  s1 (p1 o1) (p2 o2) (p2 o3)
  s2 ...

But, as I said, is it too early for me to say definitely this is the
best approach.


Dan Brickley wrote:
> On 5 April 2012 05:49, Jakob Homan <jgho...@gmail.com> wrote:
>> Ack!, I suck.  Sorry.  I hadn't realized we'd gone through most of
>> them, which itself is a good thing.  I'll get some new ones added
>> first thing in the morning.  Sorry.
> Do we have something around "document a workflow to get RDF graph data
> into Giraph?". A few of us have been talking about it here or there,
> and I've heard various strategies mentioned (e.g. Ntriples as it's a
> simple line-oriented format; piggybacking on HBase or other storage
> that Giraph already has adaptors for; integrating Apache Jena; ...). I
> can't find much in JIRA but
> https://issues.apache.org/jira/browse/GIRAPH-141 touches on the issue
> (since we can't currently easily represent fully general RDF graphs
> since two nodes might be connected by more than one typed edge). Even
> without multigraphs it ought to be possible to bring RDF-sourced data
> into Giraph, e.g. perhaps some app is only interested in say the
> Movies + People subset of a big RDF collection. And so perhaps most of
> the work is in preprocessing for now - e.g. via Ntriples + Pig; but
> still it would be great to have a clear HOWTO.
> As an interested party on the periphery, a JIRA for this would give a
> natural place to monitor, read up, maybe even help. And I'm sure I'm
> not alone...
> cheers,
> Dan

Reply via email to