On 03/01/11 05:59, Raj Chaudhuri wrote:
Hello All,
Thanks for taking my question. I want to know what you guys think about TDB
versus SDB versus some of the other stores out there for RDF technology. To add
context, I am building a site that I hope someday will have millions of users.
Granted I am a nobody, and there is no one on my site yet :-). However, I want
to try and make the right decisions from scratch. Therefore, I am wondering if
anyone has any recommendations? I have read the blogs, etc, but cannot come to
a conclusion as of yet.
Some more specific questions:1
1. TDB will be faster than SDB because it is in memory, yes?
SDB uses SQL so there are two processes involved - the query engine and
the SQL database. The nature of the use of SQL means there can many
JDBC operations per SPARQL query. JDBC adds overhead.
SQL engines don't seem to scale as well. The layout for RDF is a few,
very large narrow tables, which isn't the design target for most SQL
systems.
TDB runs same-JVM as the application - the data is on disk, it is cached
in-memory.
TDB changes to use memory mapped I/O on 64 bit systems.
2. When you talk of clustering TDB, what does that mean exactly?
TDB is not clustered. Any clustering would have be managed by the
application.
I got a
little confused with master node + storage node. Does that mean if I have two
machines with 8 GB of RAM then only one will hold the models in memory, and the
other is just for storage? Or is this an actual scalable in-memory database
that will use all 16 GB of RAM and just persist different pieces of information
in each?
Guys, happy New Year! May it be prosperous for us all!
Thank you sincerely and utterly for taking my questions.
Raj
To make the rigth decisions, then use standards.
You can have the SPARQL databases, and a SPARQL server (e.g. Fuseki), to
provide the data tier in your application then access it via standard
SPARQL protocols. You can change your database without changing your
application.
Andy