[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?

2015-11-18 Thread Dan Kennedy
On 11/18/2015 03:37 PM, T?r?k Edwin wrote:
> On 11/17/2015 12:11 AM, Deon Brewis wrote:
>> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't 
>> usable in any way shape or form. It will inevitably lead to catalog 
>> corruption if you hard-reboot OSX, even without the database or application 
>> open.
>>
> I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7 
> and Debian Jessie, but after 24-48h of continous writes to the DB I always 
> get a corruption:
> SQLite result 0xb: database corruption at line 76915 of [767c1727fe]
> SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks WHERE 
> revision_id=:revision_id]
> Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed: 
> (code 0xb: database disk image is malformed) database disk image is malformed
>
> After this 'pragma integrity check' says:
> row 90814 missing from index sqlite_autoindex_revision_ops_1
> wrong # of entries in index sqlite_autoindex_revision_ops_1
> row 1046646 missing from index idx_revmap
> row 1046646 missing from index sqlite_autoindex_revision_blocks_1
> wrong # of entries in index idx_revmap
> wrong # of entries in index sqlite_autoindex_revision_blocks_1
>
> There are not reboots involved, just multiple processes accessing a WAL DB. 
> Without mmap I/O I've never seen corrupted DBs in our application.

As of yesterday, SQLite uses a read-only mapping in mmap mode. The db 
file is written using plain old write(), just as in non-mmap mode:

   http://sqlite.org/src/info/67c5d3c646c8198c

It would be interesting to know if this clears the problem in your 
environment.

Dan.



[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?

2015-11-18 Thread Török Edwin
On 11/17/2015 12:11 AM, Deon Brewis wrote:
> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't 
> usable in any way shape or form. It will inevitably lead to catalog 
> corruption if you hard-reboot OSX, even without the database or application 
> open.
> 

I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7 and 
Debian Jessie, but after 24-48h of continous writes to the DB I always get a 
corruption:
SQLite result 0xb: database corruption at line 76915 of [767c1727fe]
SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks WHERE 
revision_id=:revision_id]
Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed: 
(code 0xb: database disk image is malformed) database disk image is malformed

After this 'pragma integrity check' says:
row 90814 missing from index sqlite_autoindex_revision_ops_1
wrong # of entries in index sqlite_autoindex_revision_ops_1
row 1046646 missing from index idx_revmap
row 1046646 missing from index sqlite_autoindex_revision_blocks_1
wrong # of entries in index idx_revmap
wrong # of entries in index sqlite_autoindex_revision_blocks_1

There are not reboots involved, just multiple processes accessing a WAL DB. 
Without mmap I/O I've never seen corrupted DBs in our application.

The corruption doesn't lead to any data loss, apparently only the indexes were 
corrupted beacuse I was able to recover the DBs with .clone+setting journal 
mode again.

However I haven't had the time yet to write a standalone testcase and attempt 
to reproduce this with SQLite alone, so I can't rule out that this isn't a 
memory corruption bug in our application, or one of the other libraries that we 
link with.

I notice that SQLite doesn't use msync, and msync(2) says 'Without use of this 
call there is no guarantee that changes are written back before munmap(2) is 
called'.
Is f(data)sync enough to ensure changed mmap pages are written to the disk?

-- 
Edwin T?r?k | Co-founder and Lead Developer

Skylable open-source object storage: reliable, fast, secure
http://www.skylable.com


[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?

2015-11-18 Thread Howard Chu
Dan Kennedy wrote:
> On 11/18/2015 03:37 PM, T?r?k Edwin wrote:
>> On 11/17/2015 12:11 AM, Deon Brewis wrote:
>>> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't
>>> usable in any way shape or form. It will inevitably lead to catalog
>>> corruption if you hard-reboot OSX, even without the database or application
>>> open.
>>>
>> I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7
>> and Debian Jessie, but after 24-48h of continous writes to the DB I always
>> get a corruption:
>> SQLite result 0xb: database corruption at line 76915 of [767c1727fe]
>> SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks
>> WHERE revision_id=:revision_id]
>> Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed:
>> (code 0xb: database disk image is malformed) database disk image is malformed
>>
>> After this 'pragma integrity check' says:
>> row 90814 missing from index sqlite_autoindex_revision_ops_1
>> wrong # of entries in index sqlite_autoindex_revision_ops_1
>> row 1046646 missing from index idx_revmap
>> row 1046646 missing from index sqlite_autoindex_revision_blocks_1
>> wrong # of entries in index idx_revmap
>> wrong # of entries in index sqlite_autoindex_revision_blocks_1
>>
>> There are not reboots involved, just multiple processes accessing a WAL DB.
>> Without mmap I/O I've never seen corrupted DBs in our application.
>
> As of yesterday, SQLite uses a read-only mapping in mmap mode. The db file is
> written using plain old write(), just as in non-mmap mode:

That's the safest way to use mmap, but keep in mind that this requires a 
unified buffer cache and systems like OpenBSD still don't have that, so this 
approach will cause corruptions on systems like that.
>
>http://sqlite.org/src/info/67c5d3c646c8198c
>
> It would be interesting to know if this clears the problem in your 
> environment.


-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?

2015-11-18 Thread Richard Hipp
On 11/18/15, Howard Chu  wrote:
> Dan Kennedy wrote:
>>
>> As of yesterday, SQLite uses a read-only mapping in mmap mode. The db file
>> is
>> written using plain old write(), just as in non-mmap mode:
>
> That's the safest way to use mmap, but keep in mind that this requires a
> unified buffer cache and systems like OpenBSD still don't have that, so this
> approach will cause corruptions on systems like that.

Thanks for the warning, Howard.

We independently discovered this back a few years ago and thus disable the mmap
capability for OpenBSD and QNX.  See
https://www.sqlite.org/src/artifact/1b8c1b37f0?ln=742-748 for the
code.  Those are the only modern platforms that we know of that give
problems.  If you are aware of others, please let us know.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?

2015-11-16 Thread Deon Brewis
We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't 
usable in any way shape or form. It will inevitably lead to catalog corruption 
if you hard-reboot OSX, even without the database or application open.

We've tried FULLSYNC and F_FULLFSYNC, but it makes no difference.

Repro steps:
a) Run our application and write stuff to the database
b) Close our app & wait for the .wal file to disappear AND the app to disappear 
from activity monitor
c) Wait another 2 minutes
d) Copy your SQLITE db file to a network share
e) Hard reboot OSX
f) After the reboot, copy the same SQLITE db to a second network share
g) File compare (d) vs. (f)

Observe: The files in (c) and (e) are virtually never identical. NOTE: There is 
no application or SQLITE is involved here. The app is closed. It (should have) 
checkpointed and flushed properly (FULLSYNC + FULLFSYNC). However, the MAC 
doesn't seem to write the database to disk. So once you hard-reboot you don't 
have the same file anymore. SQLITE detects this as catalog corruption around 
25% of the time, but a file compare shows differences pretty much 100% of the 
time. Not just benign differences in unused pages - the header is more often 
than not different as well.

Without the Hard reboot our database always survives a close. You can 
gracefully shutdown the application, pkill it, force terminate, crash it, soft 
reboot - it all survives. However, once you hard reboot OSX - even AFTER the 
app is closed - it has a very high probability of corrupting our database.  
I've seen a worse case scenario where the hard reboot followed an app graceful 
shutdown by 12 hours, and it still corrupted the database.

This is so easy to reproduce I'm not sure why this isn't reported as a 
large-scale problem? It also only reproduces on OSX (both El Capitan and 
Mavericks) - Android, iOS, PC all work fine.


Anyway, it's not a big deal for us to set mmap_size to 0 to work around this.

The big problem I have however is the .wal.  We use SQLITE from multiple 
threads, and as such it's using shared memory to read/write the .wal. However, 
if persisting memory mapped files on OSX is so unreliable, then how can the 
.wal be expected to survive a hard reboot... So I'd also like to have a way to 
not use memory mapped I/O for .wal files, but I don't think there is a way 
unless I change the architecture of my app to have single-threaded access to 
SQLITE? Or is there another way?

- Deon



[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?

2015-11-16 Thread Richard Hipp
On 11/16/15, Deon Brewis  wrote:
> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't
> usable in any way shape or form. It will inevitably lead to catalog
> corruption if you hard-reboot OSX, even without the database or application
> open.
>
> We've tried FULLSYNC and F_FULLFSYNC, but it makes no difference.
>
> Repro steps:
> a) Run our application and write stuff to the database
> b) Close our app & wait for the .wal file to disappear AND the app to
> disappear from activity monitor
> c) Wait another 2 minutes
> d) Copy your SQLITE db file to a network share
> e) Hard reboot OSX
> f) After the reboot, copy the same SQLITE db to a second network share
> g) File compare (d) vs. (f)
>
> Observe: The files in (c) and (e) are virtually never identical. NOTE: There
> is no application or SQLITE is involved here. The app is closed. It (should
> have) checkpointed and flushed properly (FULLSYNC + FULLFSYNC). However, the
> MAC doesn't seem to write the database to disk. So once you hard-reboot you
> don't have the same file anymore. SQLITE detects this as catalog corruption
> around 25% of the time, but a file compare shows differences pretty much
> 100% of the time. Not just benign differences in unused pages - the header
> is more often than not different as well.
>
> Without the Hard reboot our database always survives a close. You can
> gracefully shutdown the application, pkill it, force terminate, crash it,
> soft reboot - it all survives. However, once you hard reboot OSX - even
> AFTER the app is closed - it has a very high probability of corrupting our
> database.  I've seen a worse case scenario where the hard reboot followed an
> app graceful shutdown by 12 hours, and it still corrupted the database.
>
> This is so easy to reproduce I'm not sure why this isn't reported as a
> large-scale problem? It also only reproduces on OSX (both El Capitan and
> Mavericks) - Android, iOS, PC all work fine.
>
>
> Anyway, it's not a big deal for us to set mmap_size to 0 to work around
> this.
>
> The big problem I have however is the .wal.  We use SQLITE from multiple
> threads, and as such it's using shared memory to read/write the .wal.

No.  It uses shared memory for the ".shm" file, which is only a
performance optimization and is not used for recovery.  The ".wal"
file is written using write() or pwrite().


> However, if persisting memory mapped files on OSX is so unreliable, then how
> can the .wal be expected to survive a hard reboot... So I'd also like to
> have a way to not use memory mapped I/O for .wal files, but I don't think
> there is a way unless I change the architecture of my app to have
> single-threaded access to SQLITE? Or is there another way?

-- 
D. Richard Hipp
drh at sqlite.org