[
https://issues.apache.org/jira/browse/DERBY-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293709#comment-13293709
]
Dag H. Wanvik edited comment on DERBY-4269 at 6/13/12 3:30 PM:
---------------------------------------------------------------
Yes, that was my conclusion too. The usage of property lists is extremely
convoluted in the Derby boot process, and it was very hard to understand what's
going on. And since this goes on in a forked VM, using a debugger is tricky
without affecting the timing... I replaced all usages of java.util.Properties
with a specialized version (DerbyProperties), so I could track all allocations
and modifications of properties. Adding to the problem is the fact that
Properties can have default Properties lists recursively, and this is being
used. In addition Derby adds two layers of specializations:
UpdateServiceProperties and DoubleProperties.
Mostly, property lists are passed into constructors without any cloning of them
so the number of live aliases is high.
It turns out the layering at the time of the NPE in the boot is like this:
DoubleProperties#write instance of
UpdateServiceProperties#defaults instance of
Properties#defaults instance of
Properties (no default) <----- race on this one
The bottom one is the one that's being changed under our feet in these lines in
EmbedConnection#bootDatabase:
:
// clear these values as some modules hang onto
// the properties set corresponding to service.properties
// and they shouldn't be interested in these JDBC attributes.
info.clear();
It is initially allocated in EmbedConnection#filterProperties:
private Properties filterProperties(Properties inputSet) {
Properties limited = new org.apache.derby.iapi.util.DerbyProperties();
:
assigning it to "info".
The thread that is doing this is the thread that initially started the slave
operation (cf. URL "startSlave=true"). This thread hangs until the master has
been started and replication is underway[1]. When the master comes online this
thread the connect returns with XRE08 ("REPLICATION_SLAVE_STARTED_OK").
The failover happens in this test almost immediately, and the
SlaveDatabaseBootThread then starts moving again (it i waiting for more log
records to replay or failover).
Unfortunately, in the current scenario, the thread doing "startSlave=true" is
still active, normally it would have finished long before
SlaveDatabaseBootThread starts moving again.
The Property default at the 4th level above would normally be empty, since its
being cleared by the "startSlave=true" thread in the "info.clear". Bit since
it's lagging behind, it only does this "clear" when the SlaveDatabaseBootThread
has started its boot, leading to the contention.
[1] In EmbedConnection.bootDatabase, this corresponds to the line:
// try to start the service if it doesn't already exist
if (!Monitor.startPersistentService(dbname, info)) {
was (Author: dagw):
Yes, that was my conclusion too. The usage of property lists is extremely
convoluted in the Derby boot process, and it was very hard to understand what's
going on. And since this goes on in a forked VM, using a debugger is tricky
without affecting the timing... I replaced all usages of java.util.Properties
with a specialized version (DerbyProperties), so I could track all allocations
and modifications of properties. Adding to the problem is the fact that
Properties can have default Properties lists recursively, and this is being
used. In addition Derby adds two layers of specializations:
UpdateServiceProperties and DoubleProperties.
Mostly, property lists are passed into constructors without any cloning of them
so the number of live aliases is high.
It turns out the layering at the time of the NPE in the boot is like this:
DoubleProperties#write instance of
UpdateServiceProperties#defaults instance of
Properties#defaults instance of
Properties (no default) <----- race on this one
The bottom one is the one that's being changed under our feet in these lines in
EmbedConnection#bootDatabase:
:
// clear these values as some modules hang onto
// the properties set corresponding to service.properties
// and they shouldn't be interested in these JDBC attributes.
info.clear();
It is initially allocated in EmbedConnection#filterProperties:
private Properties filterProperties(Properties inputSet) {
Properties limited = new org.apache.derby.iapi.util.DerbyProperties();
:
assigning it to "info".
The thread that is doing this is the thread that initially started the slave
operation (cf. URL "startSlave=true"). This thread hangs until the master has
been started and replication is underway. When the master comes online this
thread the connect returns with XRE08 ("REPLICATION_SLAVE_STARTED_OK").
The failover happens in this test almost immediately, and the
SlaveDatabaseBootThread then starts moving again (it i waiting for more log
records to replay or failover).
Unfortunately, in the current scenario, the thread doing "startSlave=true" is
still active, normally it would have finished long before
SlaveDatabaseBootThread starts moving again.
The Property default at the 4th level above would normally be empty, since its
being cleared by the "startSlave=true" thread in the "info.clear". Bit since
it's lagging behind, it only does this "clear" when the SlaveDatabaseBootThread
has started its boot, leading to the contention.
> Failover did not succeed in 2 min.: testReplication_Local_3_p6_autocommit_OK
> ----------------------------------------------------------------------------
>
> Key: DERBY-4269
> URL: https://issues.apache.org/jira/browse/DERBY-4269
> Project: Derby
> Issue Type: Bug
> Components: Replication
> Affects Versions: 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0,
> 10.6.1.0, 10.6.2.1, 10.7.1.1, 10.8.1.2, 10.8.2.2
> Environment: OS:
> Microsoft© Windows VistaT Ultimate - 6.0.6001 Service Pack 1 - WindowsNT 0 6
> JVM:
> Sun Microsystems Inc.
> java version "1.4.2_16"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_16-b05)
> Java HotSpot(TM) Client VM (build 1.4.2_16-b05 mixed mode 32-bit)
> Reporter: Ole Solberg
> Assignee: Dag H. Wanvik
> Labels: derby_triage10_5_2
> Attachments: 4269-client-jstack.txt, 4269-master.txt,
> 4269-slave-jstack-before-failover.txt, 4269-slave.txt, DERBY-4269.diff,
> DERBY-4269.stat, DERBY-4269b.diff, db_master-derby.log,
> db_slave-derby.log.gz, derby-4269-typo.diff
>
>
> Failover did not succeed.
> 2)
> testReplication_Local_3_p6_autocommit_OK(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p6)junit.framework.AssertionFailedError:
> Failover did not succeed.
> at
> org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.connectPing(ReplicationRun.java:270)
> at
> org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p6.derby_3896(ReplicationRun_Local_3_p6.java:200)
> at
> org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p6.testReplication_Local_3_p6_autocommit_OK(ReplicationRun_Local_3_p6.java:86)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at
> org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:106)
> at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
> at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
> at junit.extensions.TestSetup.run(TestSetup.java:25)
> See
> http://dbtg.thresher.com/derby/test/Daily/jvm1.4/testing/testlog/vista-64/782274-suitesAll_diff.txt
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira