On Fri, Jun 21, 2002 at 06:28:59PM +0100, Paul Jakma wrote:
> > Now you're getting a little out of hand. A journaling filesystem is
> > a piling of one set of warts ontop of another. Now you've got a
> > situation where even though the filesystem might be 100% consistent
> > even after a catastrophic crash, the database won't be. There's no
> > need to use a journaling filesystem with PostgreSQL
>
> eh? there is great need - this is the only way to guarantee that when
> postgresql does operations (esp on its own application level logs)
> that the operation will either:
>
> - be completely carried out
> or
> - not carried out at all
Journaling filesystems doesn't provide this guarantee in general,
because the transactional-interface is not provided to userspace. The
only thing the filesystem guarantees is that filesystem-operations are
carried out completely or not at all.
If a non-journaling filesystem crashes while rename() is in progress,
the file may be present in two directories or none (depending on
implementation). If you create a file, write to it and then crash, the
file may be gone from the directory. _Theese_ are the problems solved by
journaling filesystems.
Luckily postgresql implements it's own system (WAL) to get the same
feature ("atomic" updates) on the database-level.
[ There are actually some work underway to export a transactional
filesystem-API to userspace. When this is completed, an application
could tell the filesystem what operations are part of an transaction,
and have "atomic" updates even to multiple files :-) ]
> > either full mirroring or full level 5 protection). Indeed there are
> > potentially performance related reasons to avoid journaling
> > filesystems!
>
> if they're any good they should have better synchronous performance
> over normal unix fs's. (and synchronous perf. is what a db is
> interested in).
Syncrounous metadata updates (create/rename ++): yes - they should be
faster. But postgresql doesn't do many of those.
Syncrounous data-updates (write/append): no - because postgresql
already do the writes to a log so there are no seeks involved in the
sync writes. (the writes to the actual files happens asyncrounous).
So there is a theoretical improvement, but it's not likely to show up on
a typical SQL-benchmark...
--
Ragnar Kj�rstad
Big Storage