Re: [h2] Database corruption on 1.3.176: any suggestions on how to avoid?

Thomas Mueller Fri, 03 Jul 2015 05:48:13 -0700

Hi,

> Is there any way to know for sure a database is consistent?


The statement "script to <fileName>" will detect most corruptions. It will
not detect corruptions in just the secondary indexes, but those are quite
rare.

> We have a (new) corrupted database from a machine that isn't suffering
from unexpected reboots

Even thought my main priority is now to get the MVStore stable (which
hopefully will fully solve the corruption problem), I would still be
interested to understand why you have that many corruptions. My guess is
that your use case is slightly different than what others do. I have a list
of questions I have used before (you have answered some of those questions
already):

- What is your database URL?

- Did you use LOG=0 or LOG=1? Did you read the FAQ about it?

- Did the system ever run out of disk space?

- Could you send the full stack trace of the exception including message
text?

- Did you use SHUTDOWN DEFRAG or the database setting DEFRAG_ALWAYS with H2
version 1.3.159 or older?

- How many connections does your application use concurrently?

- Do you use temporary tables?

- With which version of H2 was this database created?
    You can find it out using:
    select * from information_schema.settings where name='CREATE_BUILD'
    or have a look in the SQL script created by the recover tool.

- Did the application run out of memory (once, or multiple times)?

- Do you use any settings or special features (for example cache settings,
    two phase commit, linked tables)?

- Do you use any H2-specific system properties?

- Is the application multi-threaded?

- What operating system, file system, and virtual machine
    (java -version) do you use?

- How did you start the Java process (java -Xmx... and so on)?

- Is it (or was it at some point) a networked file system?

- How big is the database (file sizes)?

- How much heap memory does the Java process have?

- Is the database usually closed normally, or is process terminated
    forcefully or the computer switched off?

- Is it possible to reproduce this problem using a fresh database
    (sometimes, or always)?

- Are there any other exceptions (maybe in the .trace.db file)?
    Could you send them please?

- Do you still have any .trace.db files, and if yes could you send them?

- Could you send the .h2.db file where this exception occurs?

Regards,
Thomas



On Friday, July 3, 2015, Rob Van Dyck <[email protected]> wrote:

> Hi Thomas,
>
> Thanx for your initial reply. We would really like to get to the bottom of
> this and find out what's wrong.
>
> I still have two questions:
>
>    1. Is there any way to know for sure a database is consistent? Eg. By
>    running Recover with -trace and -transactionLog? Will this check all
>    internal indexes etc? It is important for us to be able to identify whether
>    a database was corrupted in some way (note that that is different from
>    knowing that there is no corruption left after an export (using Recover)
>    and an import into a new DB).
>    2. We have a (new) corrupted database from a machine that isn't
>    suffering from unexpected reboots (no critical errors according to the
>    Windows logs) and didn't experience an out-of-memory (that we know of)...
>    and we are looking at external factors: eg. Maybe the shadow copy (Windows
>    Restore Checkpoint) has interfered somehow with the DB, or maybe Windows
>    Restore has restored a corrupted snapshot, etc etc ... If you have any
>    ideas that we might want to check out, they are very welcome. It is a
>    customer with multiple computers, and this particular computer has had a
>    corrupted database 4 times in about two months. One of his other computers
>    had a corruption once, all other computers haven't had corruptions while
>    they are used by the same staff randomly. The DB is 8 GB in size.
>
> Kind regards,
> Rob.
>
> Op zaterdag 27 juni 2015 13:30:34 UTC+2 schreef Thomas Mueller:
>>
>> Hi,
>>
>> I'm sorry that the risk of corruption is that big. I'm not sure what the
>> problem could be. In the past, people did report corruptions now and then,
>> but not at such a high rate as you have.
>>
>> I would not move to the MVStore yet, as there are known problem in case
>> of power failure (in case of re-ordered writes). I'm working on that right
>> now. There is also a known problem with corruption after out-of-memory,
>> which is fixed in the trunk but not released yet.
>>
>> What I would probably use the old storage format ("mv_store=false" in the
>> database URL). Whether you use the very latest 1.4.x version or 1.3.176
>> will probably not make a big difference.
>>
>> I would consider creating online backups regularly, but I'm not sure if
>> that's feasible in your case.
>>
>> Regards,
>> Thomas
>>
>>
>>
>> On Friday, June 26, 2015, Rob Van Dyck <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I work for a small company using (the latest stable) H2 in our software.
>>> Our client base is starting to grow (+-100 installations on client
>>> computers, most have DB's multiple GBs in size) and we are starting to run
>>> into more problems with broken (and sometimes worse: unrepairable) H2 DBs.
>>> Our clients use lots of different OSes (all Windows/Mac OS X) on normal
>>> commodity hardware. To give you an estimate about the failure rate: we have
>>> had about 10 broken DBs in the last 6 months.
>>>
>>> We currently use an embedded persistent database with default connection
>>> properties: "jdbc:h2:file:" + h2Path + ";IFEXISTS=TRUE" after which we set
>>> autocommit to false. There is only one thread connected with the DB and the
>>> database was created using the latest version stable H2 version.
>>>
>>> We know for sure a few instances happened a limited time after our
>>> software ran into an out-of-memory situation. We also suspect some happened
>>> after an OS-level crash which caused the computer to reboot without having
>>> a chance to shutdown properly (e.g., power failure or the user pressing the
>>> reset button).
>>>
>>> The data is privacy sensitive, so we are reluctant to provide it to you
>>> unless that is the only option.
>>>
>>> We were hoping you might be able to hint us a little bit on what we
>>> might do to avoid these issues?
>>>
>>> 1. We are converting our embedded persistent H2 DB to a (tcp)server
>>> started by a different process. Hoping that OOMs in our software won't make
>>> H2 corrupt since the H2 process can shutdown cleanly. Do you think this
>>> might help for OOMs?
>>> 2. We are wondering whether we are missing certain properties to set on
>>> the connection? We looked at UNDO_LOG and LOG, but the default settings are
>>> already the 'safest'.
>>> 3. We are using the latest stable version 1.3.176 (and use the default
>>> of its 'storage engines' called B-tree (?). I.e., we don't use MVStore).
>>> Should we consider moving to the beta version? Could that possibly have
>>> more protection against these types of failure?
>>> 4. We know some instances of corruption happened in a virtualized
>>> environment (where the guest OS 'crashed'). We tried to reproduce this by
>>> running a Windows 8 guest on a Linux host, where we tried to reset and
>>> shutdown our application multiple times (10) while it was performing heavy
>>> database updates. We could not reproduce the issue.
>>> 5. One of the issues is that we cannot reliably detect issues. At one
>>> time we ran the H2 recovery tool which gave us no errors so we continued
>>> using the existing DB, but immediately afterwards this resulted in H2
>>> complaining about corruption. Is this possible (does the recovery tool
>>> check all kind of errors? Or does it skip, e.g., index pages)? Is there a
>>> way to know for sure that there is no corruption?
>>> 6. We have tried on some occasions to run the recovery tool and
>>> re-import the corrupted database, but at least on one occasion this gave us
>>> errors so we were unable to restore the data. Unfortunately we do not have
>>> the error output anymore.
>>> 7. The next time this happens, is there anything that we should check
>>> (e.g., the trace file)?
>>>
>>> I'll include some of the stacktraces, maybe this can give you an
>>> indication of what might have gone wrong.
>>>
>>> Thanx for your answers and/or tips.
>>>
>>> Kind regards,
>>> Rob.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "H2 Database" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/h2-database.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <javascript:_e(%7B%7D,'cvml','h2-database%[email protected]');>
> .
> To post to this group, send email to [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>.
> Visit this group at http://groups.google.com/group/h2-database.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Re: [h2] Database corruption on 1.3.176: any suggestions on how to avoid?

Reply via email to