[Monetdb-bugs] [ monetdb-Bugs-1593271 ] logger_create does not check for errors

SourceForge.net Sun, 02 Dec 2007 19:20:05 -0800

Bugs item #1593271, was opened at 2006-11-09 02:09
Message generated for change (Comment added) made by sf-robot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1593271&group_id=56967


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF/runtime
Group: (zombie: Pathfinder 0.14)
>Status: Closed
Resolution: Fixed
Priority: 8
Private: No
Submitted By: Peter Boncz (boncz)
Assigned to: Niels Nes (nielsnes)
Summary: logger_create does not check for errors

Initial Comment:
If reading the logs fails, logger_create() should
create an error. Now the return value of
logger_readlogs() is ignored, see line 800 in logger.mx:

 logger_readlogs(logger, fp, filename);

the database is then just started as if nothing
happened. With a corrupt log, subsequent updates will
also be lost, because after hitting the corrupt point
in the logfile, logger_readlogs() gives up.

In my case, the logger had been corrupted because my
test repository was old. So my fault, but
nothwitstanding that, the logger should be designed to
handle failures.

I'm actually wondering what will happen if the log
contains an incomplete transaction. That is a legal
situation. According to the rules, such a transaction
is incomplete and thus aborted. In that case, I expect
all previous transactions to be recovered, and in fact
no error should be reported. Maybe this is the reason
that the return value of logger_readlogs() is ignored?

However, the incomplete trasaction (that will cause
error when reading the log) may obstruct future
succesful transactions appended to the log. Is that
indeed a problem?? The way to handle this would be to
start writing in the logfile at the byte position where
the imcomplete transaction started. I do not know
whether that is the case now. 

[in any case, I would prefer more flexible logging,
where transactions could log deltas intertwined,
because this releaves lock pressure. This also leads to
a situation where the logger may contain any amount of
incomplete information. Still, each individual non-last
log-delta should be complete, otherwise we have
corruption. ]

The main point of this BugReport is that one has to
distinguish between:
(1) corrupt deltas (e.g. obviously wrong codes,
inexistent bat names, impossible data values). The log
should be sanitized (i.e. we acknowledge having lost
data, but bring back the system in usable state) before
the database is restarted, or one should not allow it
to restart (but then, one has to provide a sanitizing
script -- not very attractive)
(2) incomplete deltas (indicating a crashed database
that did not make commit). These deltas should not be
applied, and the database may be restarted (though some
warning would be nice). 

my requests concern these two types of failures, and
their handling:
(1a) please detect lock corruption, report an error,
and sanitize the system (preferable)
(1b) after signalling corruption, make it possible to
get back the log in sane state without losing the
entire repository
(2a) report incomplete deltas in the log
(2b) ensure that incomplete transactions in the log
will not obstruct recovering the log later when other
completed transactions have been appended to it

A possible approach during logger_create() is to
recover all complete transactions from the log, and
then perform a checkpoint on all data bats, and remove
all logfiles (a.k.a. log restart). This should work
even also on the sane part of a later corrupt log.

Secondly, the requirement that all bats should be
mentioned in the logger catalog could be lifted. When a
transaction is logged that menions a new bat, it could
be added silently to the catalog (now this error is
ignored!!), and the state of the catalog should be
marked "dirty". When writing the commit record with a
dirty catalog, the logger should schedule a subcommit
first, to make sure the catalog is ok.

If these suggestions are followed, we can check the
logger_readlog() for errors, and just remove all log
files and clean the entire catalog in response. We will
arrive in a clean state where everything that was
recovereable has been recovered.


----------------------------------------------------------------------

>Comment By: SourceForge Robot (sf-robot)
Date: 2007-12-02 19:20

Message:
Logged In: YES 
user_id=1312539
Originator: NO

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 365 days (the time period specified by
the administrator of this Tracker).

----------------------------------------------------------------------

Comment By: Stefan Manegold (stmane)
Date: 2006-12-02 11:47

Message:
Logged In: YES 
user_id=572415
Originator: NO

set to "Pending" as we should think of creating tests also for the
logger/recovery, just like we should try to bukd tests for memory
limitations, etc.


----------------------------------------------------------------------

Comment By: Peter Boncz (boncz)
Date: 2006-11-10 15:31

Message:
Logged In: YES 
user_id=591107

Hi Niels,

It is good to know that you restart the log after an
incorrect log. That prevents a number of issues.

However, this is not what I observed. My corrupt log was
kept on multiple session starts, and thus kept failing in
the recovery process.

I think you can distinguish between a corrupt logfile and a
crashed logfile (excluding weird crashes that caused corrupt
file writes to the log or something). A crashed logfile is a
truncated logfile (i.e. cut-off). Any other abnormal logfile
is corrupt.

In any case, it seems the log_delta code should be more
defensive, as data on bats had been logged that did not
appear in the catalog.

Peter


----------------------------------------------------------------------

Comment By: Niels Nes (nielsnes)
Date: 2006-11-10 14:23

Message:
Logged In: YES 
user_id=43556

The logger cannot distinquish between crashes and incorrect
logs (as these can be caused by crashes!).
The users should have backups to solve hardware problems. 
The logger always starts a new log file after a broken log
is read, this makes sure no updates are lost
after a broken log. Also the logger now keeps its changes in
memory before applying (as required by xqueries updates). At
the end of recovering the logger always restarts, ie saves
the current status in bats and starts a new log.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1593271&group_id=56967

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

[Monetdb-bugs] [ monetdb-Bugs-1593271 ] logger_create does not check for errors

Reply via email to