Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.

The following page has been changed by InchulSong:
http://wiki.apache.org/lucene-hadoop/Hbase/RDF

------------------------------------------------------------------------------
  We propose an Hbase subsystem for RDF called HbaseRDF, which uses Hbase + 
MapReduce to store RDF data and execute queries (e.g., SPARQL) on them.
  We can store very sparse RDF data in a single table in Hbase, with as many 
columns as 
  they need. For example, we might make a row for each RDF subject in a table 
and store all the properties and their values as columns in the table. 
- This reduces costly self-joins, which results in efficient processing of 
queries, although we still need self-joins for RDF path queries.
+ This reduces costly self-joins in answering queries asking questions on the 
same subject, which results in efficient processing of queries, although we 
still need self-joins to answer RDF path queries.
  
  We can further accelerate query performance by using MapReduce for 
  parallel, distributed query processing. 
@@ -26, +26 @@

   * [:InchulSong: Inchul Song] [[MailTo(icsong AT SPAMFREE gmail DOT com)]] 
(Database Lab. , KAIST) 
  
  == Considerations ==
- When we store RDF data in a single Hbase table and process queries on them, 
an important issue we have to consider is how to reduce costly self-joins 
needed to process RDF path queries. 
+ When we store RDF data in a single Hbase table and process queries on them, 
an important issue we have to consider is how to efficiently perform costly 
self-joins needed to process RDF path queries. 
  
  To speed up these costly self-joins, it is natural to think about using 
  the MapReduce framework we already have. However, in the Sawzall paper from 
Google, the authors say that the MapReduce framework is 

Reply via email to