RS> then the slot definitions in the slot table are removed, but the RS> actual slot _values_, which live in the blob table, are retained. Is RS> this intended behaviour?
yep, it is. it sort of sucks, i know. whether or not it becomes a problem depends on what kind of data you store in slots. blobs get reused (at least they should..), so, for example, all slots with value 4 will refer to same blob id, and you can consider bloated blob table as a cache of some kind in this case :). if you have constant inflow of unique data that becomes a problem, indeed. RS> avoid the downtime if at all possible. One thing that did occur to me RS> was that I could record the current maximum bid in the blob table, RS> then go through each tree table, including root, and also the slots RS> table, recording blob id references in each one. Then take the entire RS> set of bids from the blob table; anything present in the blob table RS> but not referred to in any of the trees or slots table, and with bid RS> less than or equal to the maximum bid when the procedure was started, RS> should be obsolete, and deleting it should be safe. Does this sound RS> reasonable, or am I missing something? yep, right in general, some details seem to be wrong. first, you do not need to go through _all_ tables, but only columns that store blob ids. there is some metainformation attached to btrees, key-type and value-type. if key-type is :object, blob ids are in qi column, if value-type is :object, value column contains blob ids. then there are performance considerations -- if you have lots of data, allocating temporary storage for all blob ids might be painful. if you run cleanup procedure more-or-less frequently, there will be more alive bids than dead ones, so you can optimize it ruling out alive bids early. for example, if you work mostly with objects, you can first find the difference between blob and slots tables (because most alive bids would be in slots): SELECT blob.bid FROM blob LEFT JOIN slots ON blob.bid = slots.value WHERE slots.value IS NULL (warning: totally untested). this would return all dead bids and bids that are referenced from other tables.then you can go through rest of tables, pulling blob ids from them and eliminating ones you see in that data set, in the end you end up with only dead ones. then, you do not actually need recordning maximum bid if you're working within a single transaction -- postgresql's transactional engine will do this for you. (it works in similar way, btw -- ignores rows that have txn id larger than current txn id). _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/elephant-devel