Bugs item #1976341, was opened at 2008-05-28 11:43
Message generated for change (Comment added) made by stmane
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1976341&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF/runtime
Group: MonetDB4 4.24
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Wouter Alink (vzzzbx)
Assigned to: Niels Nes (nielsnes)
Summary: XQ: leftovers after deleting document
Initial Comment:
Using todays stable build (preJune, May28, 2008) but it happens with earlier
builds as well), the dbfarm isn't cleaned up as expected after deleting a
document with 'pf:del-doc()'.
When repeatedly adding and deleting documents, the leftovers start piling up,
and eventually the system runs out of disk-space.
See example below.
bash-3.2$ wc -c /tmp/aap.xml
292890833 /tmp/aap.xml
bash-3.2$ echo 'pf:add-doc("/tmp/aap.xml","aap.xml")' | mclient -lxq
bash-3.2$ du -sh /tmp/dbfolder/
367M /tmp/dbfolder/
bash-3.2$ echo 'pf:add-doc("/tmp/aap.xml","aap_1.xml")' | mclient -lxq
bash-3.2$ du -sh /tmp/dbfolder/
733M /tmp/dbfolder/
bash-3.2$ echo 'pf:add-doc("/tmp/aap.xml","aap_2.xml")' | mclient -lxq
bash-3.2$ du -sh /tmp/dbfolder/
1.1G /tmp/dbfolder/
bash-3.2$
bash-3.2$ echo 'pf:del-doc("aap_2.xml")' | mclient -lxq
bash-3.2$ du -sh /tmp/dbfolder/
997M /tmp/dbfolder/
bash-3.2$ echo 'pf:del-doc("aap_1.xml")' | mclient -lxq
bash-3.2$ du -sh /tmp/dbfolder/
895M /tmp/dbfolder/
bash-3.2$ du -sh /tmp/dbfolder/
895M /tmp/dbfolder/
bash-3.2$ du -sh /tmp/dbfolder/
895M /tmp/dbfolder/
bash-3.2$ echo 'pf:del-doc("aap.xml")' | mclient -lxq
bash-3.2$ du -sh /tmp/dbfolder/
794M /tmp/dbfolder/
bash-3.2$ du -sh /tmp/dbfolder/
794M /tmp/dbfolder/
bash-3.2$
----------------------------------------------------------------------
>Comment By: Stefan Manegold (stmane)
Date: 2008-07-03 22:34
Message:
Logged In: YES
user_id=572415
Originator: NO
Might be related to
[ 2009556 ] XQ: "Zombie" document in collection
http://sourceforge.net/tracker/index.php?func=detail&aid=2009556&group_id=56967&atid=482468
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-06-18 11:18
Message:
Logged In: YES
user_id=572415
Originator: NO
I propose a meeting on this issue with Peter, Niels, Sjoerd (and possibly
me), as soon as we find a common free slot in our agendas.
First suggestion:
Tomorrow (Thursday Jun 19 2008) after TTT.
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2008-06-18 11:12
Message:
Logged In: YES
user_id=591107
Originator: NO
Niels,
I am truly lost. Please, either explain to me what XQuery must do, or fix
the code yourself or (at least) document what logger does.
Let me try to explain what XQuery needs.
XQuery just wants to make bats persistent when shredded and TMsubcommit
them. And on document delete, it wants to make bats transient and
TMsubcommit that.
XQuery gives documents unique names, using a persistent sequence number.
But XQuery also wants to be able to update persistent bats, and log
changes to them using the WAL (trans_start ... trans_end).
There is a really basic problem with mixing TMsubcommit and the WAL, like
you seem to be doing. If you have a mechanism that relies both on a WAL
write and a TMsubcommit, how are you going to achieve atomicity? What if
the WAL write succeeds, but the TMsubcommit not? Atomicity cannot be
guaranteed AFAICS.
Because TMsubcommit (i.e. checkpoint) is much more efficient for
shred/delete-doc than logging all new data, XQuery uses separate mechanisms
for updates vs. shredding/deleting, and actually forbids transactions that
do both.
The fact that somehow the logger now performs subcommits (since when?)
surprises me. But lacking any documentation of the logger, I am really at a
loss how to proceed. Can you provide the functionality that we need?
So:
- I do not want to write in the WAL (log_start/log_end) for a document
shred/delete. I just want TMsubcommit.
- After TMsubcommit, we do want to log update queries in the WAL. For
this, I understand that the bats need to be somehow registered in the WAL.
During recovery actually the WAL should not worry about how to get the
bats. All bats for which updates have been logged, are known to have been
succesfully TMsubcommit-ed previously (otherwise the shred had failed). So
these bats are always there. And if they aren't present during recovery, it
means that the document had been deleted already (bat names are unique and
never re-appear in XQuery) -- thus such deltas can be ignored.
Maybe we should not register the XQuery bats at all in the logger (and the
logger should be changed such to cope with that). Or there should be a form
of "light" registering that does not cause any lrefcounts. This seems to be
the problem here.
I feel a bit frustrated as we appear to be talking about this for two
years, and things end up not working no matter how much talking is done.
Besides, I really wonder about my original question regarding atomicity and
the logger: how does SQL achieve atomicity if transactions require *both* a
succesful WAL commit write *and* a TMsubcommit?
Peter
(I know I am agging a bit, but writing documentation really would help to
avoid such questions/mysteries)
----------------------------------------------------------------------
Comment By: Niels Nes (nielsnes)
Date: 2008-06-06 08:20
Message:
Logged In: YES
user_id=43556
Originator: NO
Indeed its incorrect that pf 'commits' xquery_catalog etc bats. But next
to that it should also not do the subcommits on the to be deleted (bats or
any other snapshot to be precise). This is all done by the logger.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-06-06 00:47
Message:
Logged In: YES
user_id=572415
Originator: NO
Well, as my checkin messages already suggested, my recent logger-related
checkins do not fix this bug.
However, I guess/hope I managed to get close to the actual problem.
Thanks to Niels' hints, my recent changes helped to get the log-flush
triggered as intended in pathfinder.
However, even during the flush (i.e., restart of the logger), the
"left-over refcounts are not reduced.
As far as I can see, the reason is the fact that pathfinder "messes" with
the logger's internal BATs (xquery_catalog, xquery_seqs, xquery_snapshots),
in particular xquery_catalog:
The logger uses/exploits the delta mechanism of not yet committed changes
to BATs to keep track of which BATs have been added to or removed from the
logger's control via logger_add_bat() & logger_del_bat().
Pathfinder "messes" with this by committing also these BATs during its own
commit of shredded document BATs, obviously "destroying" the deltas.
Once the log-flush/logger-retart is triggered, the logger's bm_commit()
does/can not see the deleted BATs anymore and hence cannot reduce there
refcounts.
Given this, I tried the obvious(?): keep pathfinder from committing the
logger BATs via the followin patch:
========
--- pathfinder/runtime/pathfinder.mx 5 Jun 2008 21:18:51 -0000
1.416.2.6
+++ pathfinder/runtime/pathfinder.mx 5 Jun 2008 22:37:04 -0000
@@ -623,9 +623,6 @@
{
if (count(commitBAT) = 0) return false;
commitBAT := commitBAT.access(BAT_WRITE);
- commitBAT.append("xquery_catalog");
- commitBAT.append("xquery_seqs");
- commitBAT.append("xquery_snapshots");
commitBAT.append("collection_name");
commitBAT.append("collection_size");
commitBAT.append("doc_name");
========
Now, bm_commit() sees the delta and does its work reducing the refcount
from 1 to 0 in its first loop ("/* remove the destroyed bats */").
However, bm_commit() finally calls bm_subcommit() still with the same
catalog BAT.
bm_subcommit() itself then does also loop over the same delta of deleted
BUNs in the catalog BAT ("/* first loop over deleted then over current and
new */"),
which results in several
"!WARNING: BBPname: range error <bat_id>"
(one for each deleted BATs),
triggered by calling BBPname() (and thus BBPcheck()) on BATs with refcount
0 ...
Here, I'm (at least for now?) at the end of my "expertise".
Any help by the real logger and/or pathfinder expert(s) is now required
and highly appreciated!
Thanks you very much in advance!
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-06-04 02:16
Message:
Logged In: YES
user_id=572415
Originator: NO
See also
http://sourceforge.net/mailarchive/forum.php?thread_name=20080603233242.GA31340%40cwi.nl&forum_name=monetdb-developers
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-05-29 18:15
Message:
Logged In: YES
user_id=572415
Originator: NO
The previous release (Feb2008: MonetDB 4.22.0 + Pathfinder 0.22.0) shows
the same behavior.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-05-29 18:09
Message:
Logged In: YES
user_id=572415
Originator: NO
first analysis seems to show that after a "pf:del-doc()", the BATs holding
the document are (correctly) turned from "persistent" into "transient", and
their "lrefcnt" is reduced from "2" to "1", but (most probably due to the
"lrefcnt" of "1"), they stay "around" until this one reference to them is
released, e.g., by stopping (and restarting) the server --- need to check,
why/where this one lref comes from ...
----------------------------------------------------------------------
Comment By: Lefteris Sidirourgos (lsidir)
Date: 2008-05-29 15:28
Message:
Logged In: YES
user_id=1856546
Originator: NO
Before adding the doc:
232K MonetDB4
12K MonetDB4/xquery_logs
8.0K MonetDB4/xquery_logs/demo
216K MonetDB4/dbfarm
212K MonetDB4/dbfarm/demo
204K MonetDB4/dbfarm/demo/bat
4.0K MonetDB4/dbfarm/demo/bat/LEFTOVERS
8.0K MonetDB4/dbfarm/demo/bat/BACKUP
4.0K MonetDB4/dbfarm/demo/bat/03
44K MonetDB4/dbfarm/demo/bat/01
4.0K MonetDB4/dbfarm/demo/bat/05
4.0K MonetDB4/dbfarm/demo/bat/04
After adding the doc:
366M MonetDB4
12K MonetDB4/xquery_logs
8.0K MonetDB4/xquery_logs/demo
366M MonetDB4/dbfarm
366M MonetDB4/dbfarm/demo
366M MonetDB4/dbfarm/demo/bat
4.0K MonetDB4/dbfarm/demo/bat/LEFTOVERS
8.0K MonetDB4/dbfarm/demo/bat/BACKUP
4.0K MonetDB4/dbfarm/demo/bat/03
44K MonetDB4/dbfarm/demo/bat/01
179M MonetDB4/dbfarm/demo/bat/05
188M MonetDB4/dbfarm/demo/bat/04
After deleting the doc:
265M MonetDB4
12K MonetDB4/xquery_logs
8.0K MonetDB4/xquery_logs/demo
265M MonetDB4/dbfarm
265M MonetDB4/dbfarm/demo
265M MonetDB4/dbfarm/demo/bat
4.0K MonetDB4/dbfarm/demo/bat/LEFTOVERS
8.0K MonetDB4/dbfarm/demo/bat/BACKUP
4.0K MonetDB4/dbfarm/demo/bat/03
44K MonetDB4/dbfarm/demo/bat/01
77M MonetDB4/dbfarm/demo/bat/05
188M MonetDB4/dbfarm/demo/bat/04
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-05-29 12:12
Message:
Logged In: YES
user_id=572415
Originator: NO
could you please also report the results of
`find .../MonetDB4 -type d | xargs du -sh`
?
Thanks!
----------------------------------------------------------------------
Comment By: Lefteris Sidirourgos (lsidir)
Date: 2008-05-29 12:01
Message:
Logged In: YES
user_id=1856546
Originator: NO
Hi, I repeated this bug, and I am attaching the results. This is not a
query compile time problem but a runtime, but because the mps and algebra
use a slightly different "play docmgm tape" I did the experiments with both
back-ends. The problem is there for both back-ends, and the disk is cleaned
only when the mserver *starts* again (and not when it is killed). The
document I used is around 280MB.
I am re-assigning this bug to Stefan:) (or Peter?)
File Added: leftovers_after_del.report
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-05-28 12:40
Message:
Logged In: YES
user_id=572415
Originator: NO
Moreover, does /tmp/dbfolder/ only contain the dbfarm or also the log
directory?
How does the size of the log directory evolve/change in above example?
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-05-28 12:38
Message:
Logged In: YES
user_id=572415
Originator: NO
Wouter,
could you please check
(1) which of the sub directories of /tmp/dbfolder/ contains the
left-overs, and
(2) whether stopping and re-starting Mserver does change (reduce) the size
of /tmp/dbfolder/ (and/or its subdirectories)
?
Thanks!
----------------------------------------------------------------------
Comment By: Wouter Alink (vzzzbx)
Date: 2008-05-28 11:50
Message:
Logged In: YES
user_id=621590
Originator: YES
(increased priority a little, as it seems a showstopper for us)
additional information:
- restarting mserver does clean up the data as expected
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1976341&group_id=56967
-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs