Hi Andy, Great to hear about the transaction work.
For the N-Quad export on a live database, do you mean that a running application - with open Jena models, and possibly a thread writing to it will not interfere with the N-Quad dump - the only gotcha would be a possible missed quad just before/after the export operation? If so, I think that would sufficie for the short-term. I guess another possiblity is within the app, fire up a thread and do a Model.write out to the filesystem to save the entire model. Regarding bulk imports - I'm actually finding regular Jena model manipulation runs very fast with TDB. Within second can have a 100k statement TDB store setup. I'm not looking to boil the giant datasets out there, I'm looking at practical how-to-build-apps perspective. Speaking of which, it would be nice if the cache was controllable - similar to Ehcache (or maybe an idea for a future project for TDB to use Ehcache) - TTL, max in memory, etc. Thanks, Al On Sun, May 29, 2011 at 2:52 PM, Andy Seaborne < [email protected]> wrote: > > > On 28/05/11 08:30, Al Baker wrote: > >> I've been testing TDB for a while, and am very impressed with its >> performance. However, I do see the various emails on the mailing lists >> warning of touching the files while an application with TDB is open >> (presumably with an open Jena Model attached to the TDB directory). >> >> What kind of reliability does TDB have to survive a power hit or >> application >> crash? >> >> Are there some steps to take consistent and regular backups to mitigate >> any >> issues? >> >> Basically looking to have some level of confidence that I can use TDB in >> production, take a reasonable amount of steps to insure reliability, and >> be >> confident that I'll always either have a valid TDB store, or a way to >> incrementally backup/rollback in the case of a severe crash/file system >> error. >> >> Thanks, >> Al Baker >> > > Hi Al, > > Currently, TDB provides some update capabilities but relies on the > application maintaining MRSW (Multiple Reader Or Single Writer) concurrency > semantics together with a clean shutdown. Many of the reports are due to > letting two writers access the database at the same time or crashes without > ensuring a sync() is done which currently is important for updates. > > For read-only usage, the database is safe - it is modified or reorganised > by reads so loss of machines or applications does not damage the on-disk > database. > > TDB is an in-process database - one JVM controls the database. Having two > managing the files also will cause damage. > > You can backup a database by copy but only from a running system if you > co-ordinate with a sync() which makes the on-disk structures consistent. > Stopping the DB is better and is needed on some OS's but dumping to N-Quads > can be done on a live database. > > For updates, there are periods of vulnerability. This is being addressed > by adding ACID transactions to TDB. The transaction system is based on > write-ahead logging; read requests go straight to the DB as before so > performance there will be unchanged. > > The disk format is (probably) going to be unchanged. There are some > improvements that can be made but they aren't necessary. > > The bulk loader used to build a database from scratch will provide the best > load performance. It will remain non-transactional. Transactions will be > aimed at non-bulk updates. Where the practical boundary will be will emerge > in testing. > > The transaction work is active-work-in-progress [*] but I'm not going to > give specific release schedules except to say that as an open source > project, "release early, release often" of development versions will happen. > > Andy > > [*] Indeed, I'm writing a journaled file abstraction at this moment. >
