[ 
https://issues.apache.org/jira/browse/DERBY-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286484#comment-13286484
 ] 

Knut Anders Hatlen commented on DERBY-5358:
-------------------------------------------

I think I found the problem that's causing this. There's a race condition in 
TableDescriptor.getHeapConglomerateId():

                /* If we've already cached the heap conglomerate number, then
                 * simply return it.
                 */
                if (heapConglomNumber != -1)
                {
                        return heapConglomNumber;
                }

... (find the heap conglomerate in the list of conglomerates) ...

                heapConglomNumber = cd.getConglomerateNumber();

                return heapConglomNumber;

I instrumented this class and found that it never set heapConglomNumber to 
4,294,967,295, but the method still returned that value some times.

The problem is that heapConglomNumber is a long, and the Java spec doesn't 
guarantee that reads/writes of long values are atomic.

So what seems to happen, is:

- Two threads (T1 and T2) call getHeapConglomerateId() on the same 
TableDescriptor at about the same time, and no other calls to 
getHeapConglomerateId() have been made on that object before, so 
heapConglomNumber is initially -1.

- T1 goes ahead finding the real conglomerate number and writing it to 
heapConglomNumber.

- At the same time, T2 reads heapConglomNumber in order to check if it's 
already cached. However, since T1's write was not atomic, it only sees half of 
it. That's enough to make it see that the cached conglomerate number is -1, so 
that it concludes that it can use it, but the number it sees is not the right 
one.

If T2 happens to see only the most significant half of the conglomerate number 
written by T1, that half will probably be all zeros (because it's not very 
likely that more than 4 billion conglomerates have been created). The bits in 
the least significant half will in that case be all ones (because the initial 
value is -1, which is all ones in two's complement). The returned value will 
therefore be 0x00000000ffffffff == 4,294,967,295, as seen in the error in the 
bug description.

I've also seen variants where the returned number is a negative one. That 
happens if T2 instead sees the least significant half of the correct column 
number, and the most significant half of the initial value -1. For example, if 
the conglomerate number is 344624, the error message will say: The conglomerate 
(-4 294 622 672) requested does not exist.
                
> SYSCS_COMPRESS_TABLE failed with conglomerate not found exception
> -----------------------------------------------------------------
>
>                 Key: DERBY-5358
>                 URL: https://issues.apache.org/jira/browse/DERBY-5358
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.9.1.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>              Labels: derby_triage10_9
>
> When running the D4275.java repro attached to DERBY-4275 (with the patch 
> invalidate-during-invalidation.diff as well as the fix for DERBY-5161 to 
> prevent the select thread from failing) in four parallel processes on the 
> same machine, one of the processes failed with the following stack trace:
> java.sql.SQLException: The exception 'java.sql.SQLException: The conglomerate 
> (4,294,967,295) requested does not exist.' was thrown while evaluating an 
> expression.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:98)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142)
>         at org.apache.derby.impl.jdbc.Util.seeNextException(Util.java:278)
>         at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:407)
>         at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
>         at 
> org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
>         at 
> org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
>         at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334)
>         at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
>         at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(EmbedPreparedStatement.java:1341)
>         at D4275.main(D4275.java:52)
> Caused by: java.sql.SQLException: The exception 'java.sql.SQLException: The 
> conglomerate (4,294,967,295) requested does not exist.' was thrown while 
> evaluating an expression.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71)
>         ... 10 more
> Caused by: java.sql.SQLException: The conglomerate (4,294,967,295) requested 
> does not exist.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:122)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:71)
>         at 
> org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:256)
>         at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:400)
>         at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:348)
>         at 
> org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2290)
>         at 
> org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:82)
>         at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1334)
>         at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
>         at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(EmbedPreparedStatement.java:308)
>         at 
> org.apache.derby.catalog.SystemProcedures.SYSCS_COMPRESS_TABLE(SystemProcedures.java:792)
>         at 
> org.apache.derby.exe.acd381409ax0131x72b6x8e11x0000037164a81.g0(Unknown 
> Source)
>         at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.derby.impl.services.reflect.ReflectMethod.invoke(ReflectMethod.java:46)
>         at 
> org.apache.derby.impl.sql.execute.CallStatementResultSet.open(CallStatementResultSet.java:75)
>         at 
> org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:448)
>         at 
> org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319)
>         at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
>         ... 3 more
> Caused by: ERROR XSAI2: The conglomerate (4,294,967,295) requested does not 
> exist.
>         at 
> org.apache.derby.iapi.error.StandardException.newException(StandardException.java:278)
>         at 
> org.apache.derby.impl.store.access.RAMAccessManager.getFactoryFromConglomId(RAMAccessManager.java:382)
>         at 
> org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482)
>         at 
> org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394)
>         at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1308)
>         at 
> org.apache.derby.impl.sql.execute.DDLConstantAction.lockTableForDDL(DDLConstantAction.java:252)
>         at 
> org.apache.derby.impl.sql.execute.AlterTableConstantAction.executeConstantActionBody(AlterTableConstantAction.java:364)
>         at 
> org.apache.derby.impl.sql.execute.AlterTableConstantAction.executeConstantAction(AlterTableConstantAction.java:275)
>         at 
> org.apache.derby.impl.sql.execute.MiscResultSet.open(MiscResultSet.java:61)
>         at 
> org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:448)
>         at 
> org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:319)
>         at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
>         ... 15 more
> Test stopped after 9342310 ms
> The conglomerate number 4,294,967,295 looks suspicious, as it's equal to 
> 2^32-1. Perhaps it's hitting some internal limit on the number of 
> conglomerates? The test case used the in-memory back-end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to