[jira] [Updated] (DERBY-5234) Unable to insert data into table. Failed due be "ERROR XSDG0: Page Page(51919,Container(0, 1104)) could not be read from disk."

Rick Hillegas (JIRA) Wed, 02 May 2012 08:55:15 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rick Hillegas updated DERBY-5234:
---------------------------------

    Attachment: derby-5234-01-aa-emptyAllocPage.diff

Attaching derby-5234-01-aa-emptyAllocPage.diff. These small changes make the 
repro run correctly. Regression tests pass cleanly on this patch.

I have stumbled across at least 3 separate problems in the compression code. 
However, that may simply mean that I don't understand the code. The 3 problems 
are:

1) A boundary checking error which causes an allocation extent to think that it 
still has pages, even though those pages have been released to the operating 
system. This is what causes the repro to fail.

2) A confusion about whether a variable represents a bit position or a page 
number. This causes the code to not understand that all of the pages in an 
extent have been released. Fixing this check does not change any user-visible 
behavior, but I think that fixing the check is a step in the right direction.

3) The inability of the compression code to release pages held by the first 
allocation page. I don't understand this problem yet. Before looking into this 
one, I need advice about whether I am heading in the right direction.

More information about these 3 problems follows:

-------------------

Concerning (1), the boundary check which causes the repro to fail:

In AllocExtent.compressPages(), the new_highest_page argument can be -1. This 
happens if all of the pages in the extent turn out to be empty. However, if 
new_highest_page is -1, then the code does not fall into the block at line 577; 
that's the code which actually marks the pages as released. The value of 
new_highest_page was calculated by AllocExtent.compress(). The variable name 
new_highest_page is a confusing name. This is a bit position and not a page 
number, and in the case when it is -1, it is a flag that all pages are empty. 
AllocExtent.compress() returns new_highest_page + 1, triggering its caller to 
fall into a block at line 1074 in AllocPage.compress(); that block releases 
pages to the operating system. That is  how we end up in the situation that the 
pages are actually released but AllocExtent still thinks they are allocated. 
That, in turn, is what tricks a later INSERT into trying to write onto a 
non-existent page.

The fix is to make the code fall into the block at 577 if new_highest_page is 
-1.


-------------------

Concerning (2), the confusion about whether AllocExtent.compress() returns a 
bit position or a page number:

At line 1080 in AllocPage.compress(), the code compares a bit position to a 
page number. Bit positions are small integers, e.g., in the range 0-200. Page 
numbers are potentially larger integers in, say, the range 12000-12200. The 
weird comparison at line 1080 causes AllocPage.compress() to not recognize that 
all of the pages in the extent have been released.

I have renamed last_valid_page to last_valid_page_bit to clarify that this is a 
bit position, not a page number. And I have changed the check at 1080 to 
compare the bit position to another bit position. This comparison deserves the 
attention of someone who knows this code better than I do. Is this the right 
comparison?

In a follow-on cleanup issue, it might make sense to change variable names in 
the allocation code to clarify what is a bit number and what is a page number. 
This may disclose other questionable code in this area.


-------------------

Concerning (3), the inability of the compression code to release empty pages 
managed by the first allocation page:

I had hoped that the change for (2) would cause the compress to release more 
space. But it didn't. The compress only releases the pages managed by the 
second (last) allocation page. All of the pages managed by the first allocation 
page are also empty, but they are not released. This seems wrong to me. I would 
expect the file to shrink back to its initial size.

Before pursuing this follow-on issue, I would like advice about whether I am 
headed in the right direction. Should the compress shrink the page back to its 
initial size? Or should SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE('APP', 
'OPERATIONS', 0, 0, 1) just release empty pages managed by the second and 
higher allocation pages?


-------------------

Touches the following files:

M       java/engine/org/apache/derby/impl/store/raw/data/AllocExtent.java

Fix for (1).

----------

M       java/engine/org/apache/derby/impl/store/raw/data/AllocPage.java

Clarification for (2).

                
> Unable to insert data into table. Failed due be "ERROR XSDG0: Page 
> Page(51919,Container(0, 1104)) could not be read from disk."
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5234
>                 URL: https://issues.apache.org/jira/browse/DERBY-5234
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.5.3.0
>         Environment: HP-UX 11iv2 in production environment with JDK1.6; 
> Solaris 5/10 in test environment with JDK 1.6
>            Reporter: Varma R
>            Priority: Critical
>              Labels: ERROR, XSDG0, apache, corruption, data, derby, 
> derby_triage10_9
>         Attachments: 5234_alloc.out, 5234_page_10219.out, 5234_summary.out, 
> DataFileReader_Output.zip, DbCompressErrorTester.java, 
> derby-5234-01-aa-emptyAllocPage.diff, log191.dat, log85.dat
>
>
> One of the derby database table "gets corrupted"/"indicates connection not 
> available" during processing inserts from java client application as shown in 
> the trace and the only way to recover from this error is to rebuild the DB - 
> by deleting the data and creating the tables again. This happens once in a 
> while (thrice in a span of two months) and the java application (run in 
> multiple servers), which updates the database, processes around 100 million 
> transactions per hour (in total and each transation results in 4-5 updates to 
> the DB) 
> There are eight tables in the derby database.
>    TABLE NAME                           ROWS COUNT (at time of corruption)
> ---------------------------------------------------------------------------------
>    KPI.KPI_MERGEIN;                     362917
>    KPI.KPI_IN;                                 422508
>    KPI.KPI_DROPPED;                    53667
>    KPI.KPI_ERROR1;                       0
>    KPI.KPI_ERROR2;                       2686
>    KPI.KPI_ERRORMERGE;            0
>    KPI.KPI_MERGEOUT;                 362669
>    KPI.KPI_OUT;                             125873
> The derby database has been started with the following parameters 
> CMD="java -Dderby.system.home=$DERBY_OPTS -Dderby.locks.monitor=true 
> -Dderby.locks.deadlockTrace=true -Dderby.locks.escalationThreshold=50000 
> -Dderby.locks.waitTimeout=
> -1 -Dderby.storage.pageCacheSize=100000 -Xms512M -Xmx3072M -XX:NewSize=256M 
> -classpath $DERBY_CLASSPATH org.apache.derby.drda.NetworkServerControl start 
> -h $KPIDERBYHOST -p $DERBY_KPI_PORT"
> The corrupted database tar (filesystem) in live environment was moved to a 
> test system (Solaris system) and few checks were run on the corrupted DB as 
> part of analysis (DB does start fine)
> While trying to insert a row in any table expect KPI.KPI_MERGEIN, it is 
> successful. But when a new row is inserted into KPI.KPI_MERGEIN table using 
> command line tool it's throwing below error message (the same message that 
> appeared in live 
> ij> INSERT INTO KPI.KPI_MERGEIN (A0_TXN_ID, A1_NE_ID, A2_CHU_IP_ADDR, 
> A3_BATCH_DATE,A5_CODE) VALUES (-1, 'BMTDE', '192.2.1.3', 231456879, 'KSD');
> ERROR 08006: A network protocol error was encountered and the connection has 
> been terminated: the requested command encountered an unarchitected and 
> implementation-specific condition for which there was no architected message
> and in derby.log file it shows below error stacktrace.
> ERROR XSDG0: Page Page(51919,Container(0, 1104)) could not be read from disk.
>         at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.readPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
>         at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.FileContainer.initPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.newPage(Unknown 
> Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.addPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.BaseContainerHandle.addPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.access.heap.HeapController.doInsert(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.access.heap.HeapController.insertAndFetchLocation(Unknown
>  Source)
>         at org.apache.derby.impl.sql.execute.RowChangerImpl.insertRow(Unknown 
> Source)
>         at 
> org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown 
> Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown 
> Source)
>         at 
> org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown 
> Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(Unknown 
> Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLIMM(Unknown 
> Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown 
> Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> Caused by: java.io.EOFException: Reached end of file while attempting to read 
> a whole page.
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         ... 20 more
> ============= begin nested exception, level (1) ===========
> java.io.EOFException: Reached end of file while attempting to read a whole 
> page.
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.readPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
>         at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.FileContainer.initPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.newPage(Unknown 
> Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.addPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.BaseContainerHandle.addPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.access.heap.HeapController.doInsert(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.access.heap.HeapController.insertAndFetchLocation(Unknown
>  Source)
>         at org.apache.derby.impl.sql.execute.RowChangerImpl.insertRow(Unknown 
> Source)
>         at 
> org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown 
> Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown 
> Source)
>         at 
> org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
>         at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown 
> Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(Unknown 
> Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLIMM(Unknown 
> Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown 
> Source)
>         at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> ============= end nested exception, level (1) ===========
> 2011-05-16 10:37:21.392 GMT:
> Shutting down instance a816c00e-012f-f85f-7892-ffff874c3ff6
> ----------------------------------------------------------------
> Cleanup action completed
> The problem is only with INSERT statement. When i try SELECT statement on 
> KPI.KPI_MERGEIN table it is working well.The database file system size (in 
> seg0) is 1.3 GB
> Can anyone help me out in identifying the problem that why for one table 
> alone its throwing the above error message ? Would upgrade to a new version 
> help ? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-5234) Unable to insert data into table. Failed due be "ERROR XSDG0: Page Page(51919,Container(0, 1104)) could not be read from disk."

Reply via email to