Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-27 Thread Jim Fulton


On Sep 12, 2007, at 10:28 AM, Jim Fulton wrote:
...

  - checkbtrees.py
  - fstest.py


There's an fsrefs script that checks internal references I believe.


fsrefs.py shows loads of problems in both the data.fs and the  
resources.fs.

probably  200 entries per database. i.e.

oid 0xD87110L BTrees._OOBTree.OOBucket
last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL
refers to invalid objects:
oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing:  
'unknown'

oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing:  
'unknown'

...


  - How do I tell if something is a reference to another database?


I don't know how to do this with fsrefs.  I'm not 100% sure that  
fsrefs recognizes cross-database references.


I did a little looking at fsrefs.  It doesn't analyze the types of  
references. It just tries to load objects.  This approach, aside from  
being less informative than it should be, totally fails with multiple  
databases. Cross-database references will always be reported as  
missing by fsrefs.





I'll try to make some time in the next few days to look at this issue.


Man it's hard to make time ...



I'll look at fsrefs a bit more closely to:

  - make sure it understands cross-database references, and


It doesn't.

  - Make sure it reports whether missing references are local or  
remote.


Haha ;)

I'd like to decide what to do next based on this investigation.  In  
particular, I want to be sure if the problems you are having are  
actually due to cross-database reference issues.


I'll also look at writing a tool that might be able to recover lost  
objects from backup databases.  The idea is that a tool would scan  
a database for missing oids save the list to files, separating  
references to different databases.  Then there'd be another tool  
that would read this list and a list of old database files and scan  
the files looking for oids in the list and extracting records if  
they are found.


I spent some time on an analyses tool. See:

  http://svn.zope.org/zc.fsutil/branches/dev/

and especially:

  http://svn.zope.org/zc.fsutil/branches/dev/src/zc/fsutil/ 
references.txt?view=auto


It will help you figure out if you have holes and separate cross- 
database and local references.  You may have to work a little though.  
The data structures produced will allow you to analyze broken cross- 
database references in a way that should be fairly obvious. (Hint,  
you'll have to generate data for each database and make sure that all  
of oids mentioned in the set of cross-database references are  
actually present in the named databases.)


A major challenge is handling large databases.  We have databases  
will millions of objects and I kept having to trim the amount of data  
analyzed to fit the data structures in memory.  It is interesting to  
look at the evolution of the data structures over the last couple of  
days yesterday as I tried to cope with scale.


The obvious next step is to store data in a database rather than  
memory.  This will slow things down, but will allow me to work with  
arbitrarily large databases and keep richer data structures.


Assuming that you still care about this (you've been quiet :), I  
suggest using this tool to find the holes. (You can also use it to  
find the objects that refer to the missing objects.)


Then, once you've found the missing oids, you should go to backups,  
open file storages on the backups and, if the oids are present, copy  
the pickles to the database under repair.  Something like:


  pickles = [backup_storage.load(oid, '')[0] for oid in oids]
  t = transaction.begin()
  s = database_with_hole
  s.tpc_begin(t)
  [s.store(oid, '\0'*8, p, '', t) for (oid, p) in zip(oids, pickles)]
  s.tpc_vote(t)
  s.tpc_finish(t)

If you don't have the data in backups, then you might be able to use  
information about the objects referring to the missing objects to  
repair the refering objects by hand by deleting the references to  
missing objects.


Hope this helps.

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-12 Thread Jim Fulton


On Sep 11, 2007, at 10:27 AM, Alan Runyan wrote:


And, as you said in another node, the BTree folder actually loves in
the resources database.


Correct the BTree is in /plone/resources/files to be exact.



Cross database references are inherently weak.  A reference from a
foreign database doesn't prevent an object from being treated as
garbage.  So, if the only reference to an object is from a foreign
database, then the object is considered garbage.  It doesn't sound
like this is what's affecting you.  The cross-database reference is
to the BTree.  It sounds like the internal references are within
database.


Well.  Someone could have 'copy/pasted' a file from the content  
database

into the resources/files database.  That could have been one issue.


:(

BTW, I assume you mean cut/paste aka move.


  - checkbtrees.py
  - fstest.py


There's an fsrefs script that checks internal references I believe.


fsrefs.py shows loads of problems in both the data.fs and the  
resources.fs.

probably  200 entries per database. i.e.

oid 0xD87110L BTrees._OOBTree.OOBucket
last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL
refers to invalid objects:
oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing:  
'unknown'

oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing:  
'unknown'


Interesting. I wonder if these are actually cross-database references.


My questions are:

 - I imagine if there are 'invalid' references this is considered  
corruption

   or inconsistency?


I consider this inconsistency. The file structure is intact, but the  
data isn't what it should be.  Not that it matters to the end user  
what we call it.




  - How do I tell if something is a reference to another database?


I don't know how to do this with fsrefs.  I'm not 100% sure that  
fsrefs recognizes cross-database references.



  - Having these invalid references, is this common to  ZODB  
applications?


No.


Possibly, there's a backup that has data records for the missing  
OIDs.


Going to ask hosting company to pull up backups for the past few  
weeks.
But how i'm going to find this other than seeing if the folder  
allows me
to iterate over the items is not throwing POSKeyError.  Does that  
sound

like a decent litmus test?


Well. there's also fsrefs.

I'll try to make some time in the next few days to look at this issue.

I'll look at fsrefs a bit more closely to:

  - make sure it understands cross-database references, and

  - Make sure it reports whether missing references are local or  
remote.


I'd like to decide what to do next based on this investigation.  In  
particular, I want to be sure if the problems you are having are  
actually due to cross-database reference issues.


I'll also look at writing a tool that might be able to recover lost  
objects from backup databases.  The idea is that a tool would scan a  
database for missing oids save the list to files, separating  
references to different databases.  Then there'd be another tool that  
would read this list and a list of old database files and scan the  
files looking for oids in the list and extracting records if they are  
found.


I do suspect we need to do something about cross-database  
references.  My long-term plan is to:


- Add an option to file storages to skip garbage collection when  
packing.


- Add a multi-database garbage-collection protocol and tool

In the short term, It might be good to have a mechanism for limiting  
which objects can have cross-database reference to them to limit the  
chance of inadvertent cross-datavase references via move.  This would  
need to be fleshed out though, which takes time.  Perhaps something  
can be done at the zope or plone level in the code for moving objects  
to make sure that objects aren't moved between databases.


Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-11 Thread Alan Runyan
 And, as you said in another node, the BTree folder actually loves in
 the resources database.

Correct the BTree is in /plone/resources/files to be exact.


 Cross database references are inherently weak.  A reference from a
 foreign database doesn't prevent an object from being treated as
 garbage.  So, if the only reference to an object is from a foreign
 database, then the object is considered garbage.  It doesn't sound
 like this is what's affecting you.  The cross-database reference is
 to the BTree.  It sounds like the internal references are within
 database.

Well.  Someone could have 'copy/pasted' a file from the content database
into the resources/files database.  That could have been one issue.

- checkbtrees.py
- fstest.py

 There's an fsrefs script that checks internal references I believe.

fsrefs.py shows loads of problems in both the data.fs and the resources.fs.
probably  200 entries per database. i.e.

oid 0xD87110L BTrees._OOBTree.OOBucket
last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL
refers to invalid objects:
oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing: 'unknown'

My questions are:

 - I imagine if there are 'invalid' references this is considered corruption
   or inconsistency?

  - How do I tell if something is a reference to another database?

  - Having these invalid references, is this common to  ZODB applications?

 Possibly, there's a backup that has data records for the missing OIDs.

Going to ask hosting company to pull up backups for the past few weeks.
But how i'm going to find this other than seeing if the folder allows me
to iterate over the items is not throwing POSKeyError.  Does that sound
like a decent litmus test?

-- 
Alan Runyan
Enfold Systems, Inc.
http://www.enfoldsystems.com/
phone: +1.713.942.2377x111
fax: +1.832.201.8856
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-11 Thread Dieter Maurer
Alan Runyan wrote at 2007-9-11 09:27 -0500:
 ...
oid 0xD87110L BTrees._OOBTree.OOBucket
last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL
refers to invalid objects:
oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing: 'unknown'

Looks as if the OOBucket has lost quite some value links (as
only a single one links to the next bucket).

My questions are:

 - I imagine if there are 'invalid' references this is considered corruption
   or inconsistency?

I depends on your preferences.

 ...
  - Having these invalid references, is this common to  ZODB applications?

No.

At least not for ZODB applications that do not use inter database
references.

 Possibly, there's a backup that has data records for the missing OIDs.

Going to ask hosting company to pull up backups for the past few weeks.
But how i'm going to find this other than seeing if the folder allows me
to iterate over the items is not throwing POSKeyError.  Does that sound
like a decent litmus test?

You can also run fsrefs on it. When you do not get missing ...,
then the backup does not have you POSKeyError (but may lack quite
a few newer modifications).



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-10 Thread Jim Fulton


On Sep 10, 2007, at 10:34 AM, Alan Runyan wrote:


Hi guys.

It seems that one of our customers has a corrupted BTree. I would love
for someone to provide some insight on how we can recover the data.

we have two databases: 1 for resources and 1 for 'content'.  resources
contain lots of very big files.

The system is configured to have a mount point at /plone/resources is
a subclass of BTreeFolder, using internal data struct of OOBTree.


Does the BTree folder live in the content database or the resources  
database.


Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-10 Thread Dieter Maurer
Alan Runyan wrote at 2007-9-10 09:34 -0500:
 ...
While debugging this I had a conversation with sidnei about mounted
databases.  He recalled that if your using a mounted database you
should not pack.  If for some reason your mounted database had a cross
reference to another database and somehow you had a dangling reference
to the other database it would cause POSKeyError.

BTrees are actually directed acyclic graphs (DAGs) with two node types
tree (internal node) and bucket (leaf).

Beside its children, a tree contains a link to its leftmost
bucket. Beside its keys/values, a bucket contains a link to
the next bucket.

When you iterate over keys or values, the leftmost bucket
is accessed via the root's leftmost bucket link and then
all buckets are visited via the next bucket links.
Your description seems to indicate that you have lost a
next bucket link.

If you are lucky, then the tree access structure (the children links
of the tree nodes) is still intact -- or if not, is at least
partially intact. Then, you will be able to recover large parts
of your tree.


You have two options:

  * reconstruct the tree from its pickles.

This is the way, the checking of BTrees works.

  * Determine the last key (LK) before you get the POSKeyError;
then use the tree structure to access the next available
key. You may need to try ever larger values above LK
to skip a potentially damanged part of the tree.


I would start with the second approach and switch to the first one
when it becomes too tedious.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-10 Thread Jim Fulton


On Sep 10, 2007, at 10:34 AM, Alan Runyan wrote:


Hi guys.

It seems that one of our customers has a corrupted BTree. I would love
for someone to provide some insight on how we can recover the data.

we have two databases: 1 for resources and 1 for 'content'.  resources
contain lots of very big files.

The system is configured to have a mount point at /plone/resources is
a subclass of BTreeFolder, using internal data struct of OOBTree.


And, as you said in another node, the BTree folder actually loves in  
the resources database.



anytime I iterate over the keys I get POSKeyError.  anytime I iterate
over the values the same.  if I run BTree.check() on the data
structure's tree attribute (the OOBTree itself) I get a POSKeyError.

Running the utils.checkbtrees doesnt say this btree has a problem.

While debugging this I had a conversation with sidnei about mounted
databases.  He recalled that if your using a mounted database you
should not pack.  If for some reason your mounted database had a cross
reference to another database and somehow you had a dangling reference
to the other database it would cause POSKeyError.


Cross database references are inherently weak.  A reference from a  
foreign database doesn't prevent an object from being treated as  
garbage.  So, if the only reference to an object is from a foreign  
database, then the object is considered garbage.  It doesn't sound  
like this is what's affecting you.  The cross-database reference is  
to the BTree.  It sounds like the internal references are within  
database.




Is there any other ways of testing consistency of FileStorage  
other than:

  - checkbtrees.py
  - fstest.py


There's an fsrefs script that checks internal references I believe.


And any ideas how I can salvage the data? This BTree, of course, had
the most valuable data.


Possibly, there's a backup that has data records for the missing OIDs.

Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev