[ 
https://issues.apache.org/jira/browse/DERBY-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588521#comment-16588521
 ] 

Ralf Schubert edited comment on DERBY-7007 at 8/22/18 8:18 AM:
---------------------------------------------------------------

Thanks for your help [~bryanpendleton].
We tried replacing Java IBM 7.1-4.15 with Orcale Java 8. Same problems occur.

Additionally, we did an strace on the files. It somehow looks like there is 
some kind of write access from the application to the file which breaks, but we 
have no write SQL commands in the application. Analysis on this still ongoing. 
(and the file is still identical to the original file afterwards!)

Furthermore, we have now tested another set of servers, which is nearly 
identical to the problematic ones, even using same IBM Java and Tomcat and 
library versions. It works fine on these servers somehow. We still try to 
figure out the minimalistic differences between these servers but this issue is 
really strange.


was (Author: ralf schubert):
Thanks for your help [~bryanpendleton].
We tried replacing Java IBM 7.1-4.15 with Orcale Java 8. Same problems occur.

Additionally, we did an strace on the files. It somehow looks like there is 
some kind of write access from the application to the file which breaks, but we 
have no write SQL commands in the application. Analysis on this still ongoing.

Furthermore, we have now tested another set of servers, which is nearly 
identical to the problematic ones, even using same IBM Java and Tomcat and 
library versions. It works fine on these servers somehow. We still try to 
figure out the minimalistic differences between these servers but this issue is 
really strange.

> Random IOException: Bad file descriptor on new server platform
> --------------------------------------------------------------
>
>                 Key: DERBY-7007
>                 URL: https://issues.apache.org/jira/browse/DERBY-7007
>             Project: Derby
>          Issue Type: Bug
>          Components: Miscellaneous
>    Affects Versions: 10.12.1.1
>         Environment: Linux: SUSE Linux Enterprise Server for SAP Applications 
> 12 SP3  (x86_64)
> Kernel: 4.4.126-94.22-default #1 SMP Wed Apr 11 07:45:03 UTC 2018 (9649989) 
> x86_64 x86_64 x86_64 GNU/Linux
> Filesystem: /dev/mapper/appsvg-lvapps on /opt/apps type ext3 
> (rw,relatime,data=ordered)
> Java: IBM 7.1-4.15
> Tomcat: 7.0.85
>            Reporter: Ralf Schubert
>            Priority: Blocker
>
> Our customer is migrating to a new server platform. We have running several 
> applications on their old server platform right now, which are running well 
> so far. But on the new platform some random Derby errors occur reproducably 
> which we and customer are analysing since several months now. However, the 
> deeper we get the more clueless we are and it looks more and more like a 
> DERBY bug.
> We would be pleased if somebody could look into this and give us some idea if 
> this is either a bug in derby or if you have some other ideas what could 
> cause derby to behave like this.
> h2. Situation
> We have one Application which includes several embedded DERBY databases. 
> After the server is starting, the application behaves normal for a few 
> minutes. But after some minutes, one of the Derby DBs (accessed by JAVA 
> Hibernate using DERBY embedded mode) shows first an error like this on a 
> random derby file (the files vary each time):
> {code:java}
> Local derby log (/home/tomcat_i36/derby.log):
>  
> ------------  Begin Shutdown Error Stack -------------
> ERROR XSDG3: Meta-data for Container(0, 33904) could not be accessed to clean 
> /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat
>         at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown 
> Source)
>         at 
> org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(Unknown
>  Source)
>         at 
> org.apache.derby.impl.services.cache.ConcurrentCache.cleanEntry(Unknown 
> Source)
>         at 
> org.apache.derby.impl.services.cache.BackgroundCleaner.performWork(Unknown 
> Source)
>         at 
> org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown 
> Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown 
> Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown 
> Source)
>         at java.lang.Thread.run(Thread.java:809)
> Caused by: java.io.IOException: Bad file descriptor
>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:65)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:210)
>         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:754)
>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:739)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.getEmbryonicPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown 
> Source)
>         ... 8 more
> ============= begin nested exception, level (1) ===========
> java.io.IOException: Bad file descriptor
>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:65)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:210)
>         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:754)
>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:739)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readFull(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage0(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.getEmbryonicPage(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown 
> Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown 
> Source)
>         at 
> org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(Unknown
>  Source)
>         at 
> org.apache.derby.impl.services.cache.ConcurrentCache.cleanEntry(Unknown 
> Source)
>         at 
> org.apache.derby.impl.services.cache.BackgroundCleaner.performWork(Unknown 
> Source)
>         at 
> org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown 
> Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown 
> Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown 
> Source)
>         at java.lang.Thread.run(Thread.java:809)
> ============= end nested exception, level (1) ===========
> ------------  End Shutdown Error Stack ------------{code}
> After this happens, the DB behaves weird, throwing random errors (e.g. 
> telling a column is missing in a table although it is there, or telling the 
> DB is corrupt).
> Hint: We do only have READ access on those databases within the application. 
> We do not write any data to it.
> It only happens to one single DB, but this is the most complex one in the 
> application. Restarting the server will make it WORK for some minutes again!
> We deploy the exact same WAR file to Old and new platform for testing.
> h2. Already analysed
> We already tried several things and did several analysis steps:
>  # Turning off antivirus solution (Trend Micro Deep Security Agent) did not 
> help
>  # Exchanging the servers of the new server platform with another set of 
> servers with same setup  does not help
>  # Comparing a SHA1 hash of the "corrupt" files with the original files 
> turned out the files are IDENTICAL.
>  # Copying the "corrupt" DB to another system, testing it there works as 
> expected without issues.
>  # Running an integrity check on the DB shows no problems
>  # Checking the file permissions on the problematic servers shows no problems
> {code:java}
> # ls -l /opt/apps/tomcat/i36/webapps/XXXX/database/XX/seg0/c8470.dat
> -rw-r--r-- 1 tomcat_i36 tomcat 16384 Aug 21 09:32 
> /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat
>  
> # file /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat
> /opt/apps/tomcat/i36/webapps/XXXXX/database/XX/seg0/c8470.dat: data{code}
>  # Checking if any linux limits (e.g. open files limit) was reached: nothing 
> found
>  # Checking for corrupt file system: Ext3 is used on old and new platform, no 
> hint about corrupt files found
> # Upgrading DERBY from 10.11.1.1 to 10.12.1.1 did not fix the issue.
> h2. The server environments
> h3. OLD environment (working well)
> {code:java}
> Linux: SUSE Linux Enterprise Server 11 SP4  (s390x)
> Kernel: 3.0.101-91-default #1 SMP Mon Dec 12 13:06:13 UTC 2016 (544b9d1) 
> s390x s390x s390x GNU/Linux
> Filesystem: /dev/mapper/appsvg-lvapps on /opt/apps type ext3 
> (rw,acl,user_xattr)
> Java: IBM 7.1-4.1
> Tomcat: 7.0.70{code}
> h3. NEW environment (not working)
> {code:java}
> Linux: SUSE Linux Enterprise Server for SAP Applications 12 SP3  (x86_64)
> Kernel: 4.4.126-94.22-default #1 SMP Wed Apr 11 07:45:03 UTC 2018 (9649989) 
> x86_64 x86_64 x86_64 GNU/Linux
> Filesystem: /dev/mapper/appsvg-lvapps on /opt/apps type ext3 
> (rw,relatime,data=ordered)
> Java: IBM 7.1-4.15
> Tomcat: 7.0.85{code}
> Our customer has to migrate the server platforms very soon so we would be 
> very glad if someone could assist us in checking and resolving this.
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to