Andy,

I didn't open an issue yet, because I don't know how to reproduce the 
problem. Unfortunately, it seems there is no way of knowing when the 
corruption is introduced in this case. There was only one JVM running, so, 
let's talk about 

"a previous crash (with no journal restore later - new file type in 
0.9.X)"

I did not run the test myself, but it is possible that the JVM process was 
killed at some point. If so, I know that countless queries did run ok 
before the exception, but corruptions can be subtle of course. Under what 
circumstances would a journal restore fail and if so, would that not be 
logged or even stop TDB from starting? Would it be possible to have some 
sort of safety mechanism to indicate whether a journal restore was 
unsuccessful? 

Simon




From:
Andy Seaborne <[email protected]>
To:
[email protected], 
Date:
01/28/2013 11:44 AM
Subject:
Re: records not strictly increasing



On 28/01/13 15:59, Simon Helsen wrote:
> Hi guys,
>
> in one of our regular test runs, a multi-threaded test barfed once (but
> then not again and we have never seen it before even though we run these
> tests regularly). I am not sure if we accidentally bumped into a true 
tdb
> bug or whether we are doing something unsafe on our side. The exception
> occurred inside a read transaction while iterating over a ResultSet. You
> can assume that othe threads were also in read and/or write 
transactions.
>
> I have no idea how to produce a test case to replicate this, so my
> starting question will be if anyone can give a broad explanation of the
> meaning of the exception. Our tests were running Jena 2.7.4.  I may open 
a
> Jira issue as well.
>
> thanks
>
> Simon
>
> Caused by: com.hp.hpl.jena.tdb.base.StorageException: 
RecordRangeIterator:
> records not strictly increasing:
> 00000000000000af000000000003e9f90000000000250aaa000000000024ea3d //
> 00000000000000af000000000003e93700000000002792790000000000277240
>          at
> 
com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:124)
>          at org.openjena.atlas.iterator.Iter$4.hasNext(Iter.java:295)

Simon,

If it is not reproducible then there isn't anything that can be done.

The exception is detecting a bad database, not at the point in time when 
the corruption happens.  I suggest that when it happens you preserve the 
bad database and see what else might be broken in it.

It is unlikely to be due to concurrency in the same JVM - that could not 
cause this is 0.8.X either - and leads to different errors.

Either a previous crash (with no journal restore later - new file type 
in 0.9.X), or access from two JVMs are the only two possibilities that 
occur to me

                 Andy



Reply via email to