I am not sure rollforward recovery fixes a page if there is a garbage
on the page. I think there is a small difference between the
rollforward from backup and the not syncing on allocation case u are
trying to handle. During rollforward recovery from the backup , redo
is happening on the the container from the backup. During redo if
page is not found means it was not allocated before, so it just goes
and creates new page and does a sync. I think rollforward recovery
will never see a garbage or a old valid page.
Without sync on allocation, How to make sure recovery never sees good
old page on a new allocation because some OS/Hardware does zero out a
new page to a file that was used earlier by the same file or some
other file ?. I think in most generic Operating Systems one will never
give the user a old page on new page allocation to the file. But I am
not sure how it works on small devices with FLASH memory ..etc.
Thanks
-suresht
Mike Matrigali wrote:
This first step is not going all the way to buffering with no
file system interaction. The system is still going to request
the space from the file before allowing the insert. In java
the way you do this is write the file to the OS. What I am
changing is that we use to sync the empty page. This means the
user will get the normal immediate feedback if it is his inserts
that cause the filesystem to fill up.
The current system already has the sync every 8 page optimization,
rather than every page. (I think it actually sync's every page
until there are 8 and then does 8 at a time - a left over from
when it was important to conserve disk space for running on small
systems and many apps had tables less than 8 pages). I considered
some sort of dynamic changing of the preallocate size, but it
seemed too complicated - and the bigger the preallocate grew
the more likely we would grow the file TOO much.
As you say the sync every N seconds is like a checkpoint. In
the current system checkpoint will sync every file so the sync
will happen at that point no matter what.
We did the sync for 2 reasons. One was that we use to not be
able to handle redo recovery of version 0 of the page if the
system read "garbage" from the disk. This was fixed by
requirements of recent rollforward recovery project. The
system can now handle redo where it reads
garbage from disk while creating version 0 of the page, and
it also handles if trying to read page and finds it needs
to extend the file.
The second is that we guessed that some OS might require the
sync to insure the space on disk. There is no info one way or
the other in the JVM documentation so it was just a guess on
our part. My belief is that the write call we are doing will
force the OS to allocate the space to our file, and no other
file will be able use that space, so the space is ours until
OS/machine crashes at least.
I think in general users will almost never see out of space
during redo recovery. I think it might take an OS crash,
on an OS with no filesystem logging, and a subsequent process
using up al the disk space on the machine before derby gets
to run redo.
So the upside is inserts go much faster. The downside is that
in some rare (and maybe on some/most OS's never) cases the user
will see an out of disk space message during database boot that
tells him that he has to free up some disk space. The system
will boot once the disk space is available. This error exists
today if the disk is full, and undo needs to write some CLR's -
so the error isn't even new for derby.
Bryan Pendleton wrote:
Mike Matrigali (JIRA) wrote:
... the total time aproaches very close to durability=test ...
Wow! This is great; looks like a very big win. Cool!
I had two questions:
1) It seems like a really common scenario would be:
- a single enormous "batch" application is trying to insert
many, many rows into a table.
- there's enough room in memory, so we just buffer up a bunch
of pages in the cache and let the application insert as it pleases.
- the application completes, and commits,
- then we discover there's not enough space on the disk.
Is this the problem you're trying to solve?
If so, I'm a little confused as to what the "external view" of the
system will be -- how will the user know that the disk has become
full and that space needs to be added?
2) Is there any advantage that you can see to have some sort of
intermediate behavior in between the extremes of:
- always sync every freshly allocated page, and
- never sync freshly allocated pages
For example, is there any point in a "sync every N pages", or
"sync every N seconds"? (I guess the latter is sort of like a
checkpoint?)
thanks,
bryan