On 28/10/11 12:54, Paolo Castagna wrote:
Hi Andy

Andy Seaborne wrote:
On 25/10/11 10:21, Paolo Castagna wrote:
Hi,
I know that this isn't helpful in finding the root cause of the problem.

Could you at least provide a description of the setup:

0/ Is this something that has started happening or something that has
always happened in your testing?

1/ How frequently does it happen?
Every update? 1 in 100?

We only recently started using TDB 0.9.0-incubating-SNAPSHOT an all the
replicas. It's not frequent and only 1 of the 3 replicas experience it.

Only one machine sees errors (i.e if there is an error, it is on a particular machine) or if there is an error, 1 will show it, the other 2 will not but which replica it is changes.

Do the replicas see the same pattern of reads or (wild guess) does the crashing node see an overlapping read, or has not committed all previous writers, whereas the other do not? (all this is log file stuff).


442 previous write transaction were successful.

has this happened once or happens several times?
is it reproducible in the weak sense that playing those 443 updates again somtimes causes the problem?



2/ How much data is there in a store?

Not big.

In triples?



3/ How big and how frequent are the updates?
Ditto reads.

The update we were submitting when we saw the exception wasn't big
(but not tiny): 13492 triples.

That store performed 442 write transactions previously, without
problems. It failed when we submitted the 443 write transaction.

At that point in time we were submitting many updates, sequentially
one after the other, and continuously (i.e. we were replaying old
updates from a key-value store).

A couple of other nodes, running exactly the same code, did not
experience any problem. The difference might be on the reads.
There might have been reads during the updates.


4/ How are the updates being done?

We still serialize writes and we run write transactions via the usual
begin, try { ... commit } catch { abort } pattern.

So API, not SPARQL Update?

Which calls?


5/ Which version did you build (date is most helpful, svn rev is OK).

http://oss.talisplatform.com/content/repositories/talis-releases/org/apache/jena/jena-tdb/0.9.0-incubating-TALIS-RC1/

And the date of the copy of the src code?
Other components?

I'm not going to dig through archives to see if I can work out these things (and the details probably aren't there anyway).

(all these questions are ones that might help pin the problem down)


... instead of this, I'd like to use something from here:
https://repository.apache.org/content/repositories/staging/org/apache/jena/

So you said.

        Andy


Paolo

Reply via email to