Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by InchulSong: http://wiki.apache.org/lucene-hadoop/Hbase/RDF ------------------------------------------------------------------------------ We propose an Hbase subsystem for RDF called HbaseRDF, which uses Hbase + MapReduce to store RDF data and execute queries (e.g., SPARQL) on them. We can store very sparse RDF data in a single table in Hbase, with as many columns as they need. For example, we might make a row for each RDF subject in a table and store all the properties and their values as columns in the table. - This reduces costly self-joins, which results in efficient processing of queries, although we still need self-joins for RDF path queries. + This reduces costly self-joins in answering queries asking questions on the same subject, which results in efficient processing of queries, although we still need self-joins to answer RDF path queries. We can further accelerate query performance by using MapReduce for parallel, distributed query processing. @@ -26, +26 @@ * [:InchulSong: Inchul Song] [[MailTo(icsong AT SPAMFREE gmail DOT com)]] (Database Lab. , KAIST) == Considerations == - When we store RDF data in a single Hbase table and process queries on them, an important issue we have to consider is how to reduce costly self-joins needed to process RDF path queries. + When we store RDF data in a single Hbase table and process queries on them, an important issue we have to consider is how to efficiently perform costly self-joins needed to process RDF path queries. To speed up these costly self-joins, it is natural to think about using the MapReduce framework we already have. However, in the Sawzall paper from Google, the authors say that the MapReduce framework is