Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-15 Thread Richard Hipp
On 9/15/16, Martin Raiber  wrote:
>
> The program opens the database file with fd =
> open("/path/to/database/file", ...) and then closes it with close(fd)
> using the OS file api. The close() clears the posix file locks of the
> process in the database file (that is all posix file locks of all open
> connections in the process).

Yep. That's why we say that Posix locks are broken by design.
https://www.sqlite.org/src/artifact/be9ca0f90?ln=968

It's pretty easy to tell which parts of unix were developed by Dennis
Richie or Bill Joy and which parts came out of a committee.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-15 Thread Martin Raiber
On 16.09.2016 00:53 Simon Slavin wrote:
> On 15 Sep 2016, at 11:38pm, Martin Raiber  wrote:
>
>> There are two instances in the program where the sqlite database file is
>> opened
>> and closed outside of sqlite3 (to backup the database file and to sync
>> it before
>> checkpointing). This clears away the posix locks on the database files.
>> This does
>> not cause problems unless another process accesses the database file.
>> For instance
>> it deletes the wal file while it is still in use.
> Do you mean that the process makes an SQLite call which deletes the WAL file, 
> or that it deletes the WAL file using a file-handling call ?
>
> It should be impossible for a SQLite call to delete a WAL file while it's in 
> use.  The only times I've seen this done are when a SQLite database was open 
> by two different computers, one accessing it on a local disk and the other 
> accessing it across a network using a SMB share.  This, naturally, messes up 
> multi-access filehandling.

The program opens the database file with fd =
open("/path/to/database/file", ...) and then closes it with close(fd)
using the OS file api. The close() clears the posix file locks of the
process in the database file (that is all posix file locks of all open
connections in the process). The sqlite command line tool is able to get
an exclusive lock on quitting, checkpoints and deletes the wal file
which later causes the IO errors.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-15 Thread Simon Slavin

On 15 Sep 2016, at 11:38pm, Martin Raiber  wrote:

> There are two instances in the program where the sqlite database file is
> opened
> and closed outside of sqlite3 (to backup the database file and to sync
> it before
> checkpointing). This clears away the posix locks on the database files.
> This does
> not cause problems unless another process accesses the database file.
> For instance
> it deletes the wal file while it is still in use.

Do you mean that the process makes an SQLite call which deletes the WAL file, 
or that it deletes the WAL file using a file-handling call ?

It should be impossible for a SQLite call to delete a WAL file while it's in 
use.  The only times I've seen this done are when a SQLite database was open by 
two different computers, one accessing it on a local disk and the other 
accessing it across a network using a SMB share.  This, naturally, messes up 
multi-access filehandling.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-15 Thread Martin Raiber
After getting some additional information, namely that the users are
also using the
sqlite3 command line tool to read data from the database, I think I
found out how
this issue occurs.

There are two instances in the program where the sqlite database file is
opened
and closed outside of sqlite3 (to backup the database file and to sync
it before
checkpointing). This clears away the posix locks on the database files.
This does
not cause problems unless another process accesses the database file.
For instance
it deletes the wal file while it is still in use. This causes the error
messages below.

Solution seems to be to not close the database file after opening it and
to reuse
the file handle (on Linux).

On 14.09.2016 13:05 Martin Raiber wrote:
> Hi,
>
> there have been three reports by users using my software of SQLite
> 3.12.0 returning SQLITE_IOERR and logging a SQLITE_IOERR_SHORT_READ
> (522). Specifically:
>
> 2016-09-12 04:37:04: WARNING: SQLite: disk I/O error errorcode: 522
> 2016-09-12 04:37:04: ERROR: Error preparing Query [PRAGMA cache_size =
> -2048]: disk I/O error
>
> One instance was on FreeBSD where I thought it could be caused by ZFS.
> The other two instances are on Linux now. On FreeBSD the issue was
> "fixed" by repeating the prepare after it failed with an IO-error.
>
> One user has captured an strace. I cannot actually see the short read,
> though: https://forums.urbackup.org/t/urbackup-crashing/2402/8
>
> Environment:
>
> * Databases are in WAL journal mode
> * synchronous=NORMAL
> * wal_autocheckpoint is OFF. Checkpointing is done in a separate thread
> with PRAGMA wal_checkpoint(PASSIVE) and wal_checkpoint(TRUNCATE) if the
> WAL file is bigger than a certain size
>
> Thanks for any help!
>
> Regards,
> Martin
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-14 Thread Simon Slavin

On 14 Sep 2016, at 1:09pm, Martin Raiber  wrote:

> I'd expect corruptions to affect sqlite3_step as well and earlier. This
> IO error only occurs for sqlite3_prepare_v2.

Because you have posted an error which can result from a corrupted database, it 
is definitely worth doing an integrity_check (the full one, not the quick one) 
just to dismiss this as a possible cause.

There were a lot of changes from 3.7 to 3.12 and it's possible that one of them 
causes SQLite to read a piece of the database file it previously didn't need to.

I assume from your answer that you are not using any PRAGMAs in your program.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-14 Thread Martin Raiber
On 14.09.2016 13:26 Simon Slavin wrote:
> Check the hard disk format for format errors (fsck).
>
> Run "PRAGMA integrity_check" on the database file.
>
> Is the database file on a disk inside the computer running the SQLite calls, 
> or is it accessed across a network ?
>
> Do you use any PRAGMAs in your program ?
Seems to be on a local ext4 file system on some kind of hardware RAID.
Integrity check is done nightly (only quick check) and does not seem to
find any issues.

I'd expect corruptions to affect sqlite3_step as well and earlier. This
IO error only occurs for sqlite3_prepare_v2. Additionally the user
reports the problem starting to occur when using SQLite 3.12.0 and no
such issues with 3.7.17. When using 3.7.17 it also did not checkpoint
the WAL file in a separate thread (that is wal_autocheckpoint was on).

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-14 Thread Martin Raiber
On 14.09.2016 13:31 Dan Kennedy wrote:
> On 09/14/2016 06:05 PM, Martin Raiber wrote:
>> there have been three reports by users using my software of SQLite
>> 3.12.0 returning SQLITE_IOERR and logging a SQLITE_IOERR_SHORT_READ
>> (522). Specifically:
>>
>> 2016-09-12 04:37:04: WARNING: SQLite: disk I/O error errorcode: 522
>> 2016-09-12 04:37:04: ERROR: Error preparing Query [PRAGMA cache_size =
>> -2048]: disk I/O error
>>
>> One instance was on FreeBSD where I thought it could be caused by ZFS.
>> The other two instances are on Linux now. On FreeBSD the issue was
>> "fixed" by repeating the prepare after it failed with an IO-error.
>>
>> One user has captured an strace. I cannot actually see the short read,
>> though: https://forums.urbackup.org/t/urbackup-crashing/2402/8
>
> Can't see the write() calls used to write the "WARNING" or "ERROR"
> messages either. Should we expect to?

Yes, hope the user manages to capture a proper one. Sorry.

>
> This trace might be an unrelated crash.
>
> I think strace data would be helpful though.
>
> Dan.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-14 Thread Dan Kennedy

On 09/14/2016 06:05 PM, Martin Raiber wrote:

Hi,

there have been three reports by users using my software of SQLite
3.12.0 returning SQLITE_IOERR and logging a SQLITE_IOERR_SHORT_READ
(522). Specifically:

2016-09-12 04:37:04: WARNING: SQLite: disk I/O error errorcode: 522
2016-09-12 04:37:04: ERROR: Error preparing Query [PRAGMA cache_size =
-2048]: disk I/O error

One instance was on FreeBSD where I thought it could be caused by ZFS.
The other two instances are on Linux now. On FreeBSD the issue was
"fixed" by repeating the prepare after it failed with an IO-error.

One user has captured an strace. I cannot actually see the short read,
though: https://forums.urbackup.org/t/urbackup-crashing/2402/8


Can't see the write() calls used to write the "WARNING" or "ERROR" 
messages either. Should we expect to?


This trace might be an unrelated crash.

I think strace data would be helpful though.

Dan.







Environment:

* Databases are in WAL journal mode
* synchronous=NORMAL
* wal_autocheckpoint is OFF. Checkpointing is done in a separate thread
with PRAGMA wal_checkpoint(PASSIVE) and wal_checkpoint(TRUNCATE) if the
WAL file is bigger than a certain size

Thanks for any help!

Regards,
Martin



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Random IOERR_SHORT_READ in sqlite3_prepare_v2

2016-09-14 Thread Simon Slavin

On 14 Sep 2016, at 12:05pm, Martin Raiber  wrote:

> there have been three reports by users using my software of SQLite
> 3.12.0 returning SQLITE_IOERR and logging a SQLITE_IOERR_SHORT_READ
> (522).

Check the hard disk format for format errors (fsck).

Run "PRAGMA integrity_check" on the database file.

Is the database file on a disk inside the computer running the SQLite calls, or 
is it accessed across a network ?

Do you use any PRAGMAs in your program ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users