Bugs item #673249, was opened at 2003-01-23 18:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=673249&group_id=22866

Category: JBossServer
Group: v3.2
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Matt Cleveland (groovesoftware)
Assigned to: David Jencks (d_jencks)
Summary: 3.2RC1 Oracle XA Problem

Initial Comment:
I'm running into a problem with Oracle XA in 3.2RC1.  
I'm running Oracle 9.2.0.1.  I know there have been a 
bunch of problems with the Oracle XA driver and I know 
some of them are supposed to be fixed in 3.2RC1 but I 
think this is yet another Oracle problem.
 
I have a really simple test.  I have a client that starts up 
N threads.  Each thread calls an EJB.  The EJB gets an 
Oracle connection (from an XA pool) and inserts a 
record into the database and then closes the 
connection and returns.  This all works fine under lower 
load, but the log file shows the stack trace below 
occasionally under heavy load.  In some cases I then 
start getting "ORA-01591: lock held by in-doubt 
distributed transaction" on Oracle calls after the error.

The client is not receiving this error.  In fact it is only 
reported as a warning.  Still it's pretty scary to see
these flying by in the log file.  It leaves you wondering if 
the transaction committed or rolled back.  From the 
stack trace I believe that the transaction rolled back and 
this is still an Oracle concurrency bug, but
if that's not the case I wish the log message told me 
that.
 
I've tried with and without TrackConnectionByTx.  My 
oracle-xa-ds.xml is pasted below the stack trace.

2003-01-21 21:42:09,141 WARN  
[org.jboss.tm.TransactionImpl]
XAException: tx=Tra
nsactionImpl:XidImpl [FormatId=257, 
GlobalId=malt//1809, BranchQual=]
errorCode=XAER_RMERR
oracle.jdbc.xa.OracleXAException
        at oracle.jdbc.xa.OracleXAResource.checkError
(OracleXAResource.java:1157)
        at oracle.jdbc.xa.client.OracleXAResource.commit
(OracleXAResource.java:590)
        at 
org.jboss.resource.adapter.jdbc.xa.XAManagedConnecti
on.commit(XAManagedConnection.java:140)
        at org.jboss.tm.TransactionImpl.commitResources
(TransactionImpl.java:1420)
        at org.jboss.tm.TransactionImpl.commit
(TransactionImpl.java:349)
        at 
org.jboss.ejb.plugins.TxInterceptorCMT.endTransaction
(TxInterceptorCMT.java:361)
        at 
org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransacti
ons(TxInterceptorCMT.java:247)
        at org.jboss.ejb.plugins.TxInterceptorCMT.invoke
(TxInterceptorCMT.java:101)
        at org.jboss.ejb.plugins.SecurityInterceptor.invoke
(SecurityInterceptor.java:130)
        at org.jboss.ejb.plugins.LogInterceptor.invoke
(LogInterceptor.java:204)
        at 
org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke
(CleanShutdownInterceptor.java:265)
        at 
org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invo
ke(ProxyFactoryFinderInterceptor.java:154)
        at org.jboss.ejb.StatelessSessionContainer.invoke
(StatelessSessionContai
ner.java:303)
        at org.jboss.ejb.Container.invoke
(Container.java:680)
        at org.jboss.mx.server.MBeanServerImpl.invoke
(MBeanServerImpl.java:549)
        at 
org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke
(JRMPInvokerHA.java:163)
        at java.lang.reflect.Method.invoke(Native Method)
        at sun.rmi.server.UnicastServerRef.dispatch
(UnicastServerRef.java:236)
        at sun.rmi.transport.Transport$1.run
(Transport.java:147)
        at java.security.AccessController.doPrivileged
(Native Method)
        at sun.rmi.transport.Transport.serviceCall
(Transport.java:143)
        at
sun.rmi.transport.tcp.TCPTransport.handleMessages
(TCPTransport.java:460)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.r
un(TCPTransport
.java:701)
        at java.lang.Thread.run(Thread.java:479)

oracle-xa-ds
------------------
<?xml version="1.0" encoding="UTF-8"?>

<datasources>
  <xa-datasource>
    <jndi-name>XaOracleDS</jndi-name>
    <track-connection-by-tx>true</track-connection-by-
tx>


<managedconnectionfactory-
class>org.jboss.resource.adapter.jdbc.xa.oracle.XAOrac
leManagedConnectionFactory</managedconnectionfacto
ry-class>

<!--xa-datasource-
class>oracle.jdbc.xa.client.OracleXADataSource</xa-
datasource-class-->

    <xa-datasource-property 
name="URL">jdbc:oracle:thin@server:port:sid</xa-
datasource-property>
    <xa-datasource-property name="User">scott</xa-
datasource-property>
    <xa-datasource-property 
name="Password">tiger</xa-datasource-property>

    <min-pool-size>0</min-pool-size>
    <max-pool-size>50</max-pool-size>
    <blocking-timeout-millis>20000</blocking-timeout-
millis>
    <idle-timeout-minutes>15</idle-timeout-minutes>
  </xa-datasource>
</datasources>

Thanks,
Matt Cleveland


----------------------------------------------------------------------

>Comment By: David Jencks (d_jencks)
Date: 2003-02-16 19:58

Message:
Logged In: YES 
user_id=60525

Fixed in 3.0.x, 3.2, and 4 cvs.

----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 20:11

Message:
Logged In: YES 
user_id=85088

So maybe it's solved then, but I can't go back and confirm.  
We're satisfied with letting it ride and see if it comes up again 
in our load testing.  We may put some effort in later to 
reproducing it again.

----------------------------------------------------------------------

Comment By: David Jencks (d_jencks)
Date: 2003-01-28 20:08

Message:
Logged In: YES 
user_id=60525

You should have been seeing a TransactionRolledBackException or (if you were in vm 
using a local interface) TransactionRolledBackLocalException.  The ejb container is 
supposed to do a thorough job of insulating you from dealing with low level 
XAExceptions.

----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 19:47

Message:
Logged In: YES 
user_id=85088

David, as far as I can see your test is the same, but I do not 
have a very good understanding of when/how the Oracle 
exception was failing compared to how your test fails.  What 
exception will the client see?  I was seeing an exception in 
the client, but it wasn't the RMERR exception so I didn't 
consider it to be the same problem, but hadn't had time to 
investigate further.  I don't recall now what exception I was 
seeing in the client.  I didn't correlate it with the RMERR 
because it seemed to occur to long after the RMERR to be 
the same exception.  I was seeing the exception in the client 
after several RMERR exceptions scrolled by in the server 
log.  But thinking about it now with the log files scrolling by 
and with network and IO lag etc. the timing could have been 
off and they could have been the same, but I never looked 
very closely because they were different exceptions and 
different messages.  I have been expecting an RMERR to 
appear on the client.

----------------------------------------------------------------------

Comment By: David Jencks (d_jencks)
Date: 2003-01-28 19:33

Message:
Logged In: YES 
user_id=60525

I implemented pluggable XAException handling, it doesn't break anything but I don't 
know that it really works either.  If you (Matt or Igor) have any ideas on how to test 
it (for Oracle) please do so.

I have no idea what to do about the lack of exception propagation since the testcase I 
wrote IS propagating the exception correctly.  Matt, can you see any real difference 
between  the testcase that works and your code that doesn't?  Without more ideas I 
will have to wait until I can get Oracle installed and try to reproduce the problem by 
limiting the number of sessions.

----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 19:25

Message:
Logged In: YES 
user_id=85088

Increasing max sessions and max processes seems to have 
fixed the problem.  I am running with 100 threads for quite 
some time now and I have not encountered the error.

Igor, my apologies for making you work so hard on 
something that turns out to be Oracle configuration.  Is there 
any way to get this error reported so that users will be able to 
self-diagnose this?  I like your idea of pluggable exception 
formatters.  Perhaps XAOracleManagedConnectionFactory 
could intercept the exception, do some better logging and 
rethrow it, or maybe that wouldn't work.  I'm not too familiar 
with the code.

David, this leaves the problem with the exception being 
propagated to the client in an unknown state.  Now that the 
problem is fixed I doubt I can get the database parameters 
changed back which means I can't reproduce it anymore.  
Any ideas on how to proceed?

----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 17:33

Message:
Logged In: YES 
user_id=85088

Using "select count(*) from v$session" I have seen the test 
fail with session counts in the high 120 range, but I have 
confirmed that the number of sessions can get at least as 
high as 145.  Of course I can't confirm that the number of 
sessions didn't spike in between my queries to check it 
(there are other users on the system), but I was frantically 
requerying every couple of seconds and the number seemed 
to move in a predictable, incremental manner.

One thing that's odd, when I was running the test someone 
else received a max processes exceeded when running 
SQL*Plus.  Perhaps the error being reported is not exactly 
correct and the problem has something to do with this.  I will 
investigate this angle to the best of my ability.

I'm still working on that init.ora file.

----------------------------------------------------------------------

Comment By: Igor Fedorenko (igorfie)
Date: 2003-01-28 16:33

Message:
Logged In: YES 
user_id=232950

That's weird. You can monitor total number of oracle sessions 
using "select count(*) from v$session" because oracle starts 
a number of internal sessions (I do not know if this number 
can change or not).

Oracle used to limit number of distributed transactions but 
this limitation was removed in 9.2 as far as I know. Check 
with documentation @ http://technet.oracle.com.

Also, can you send me contents of your init.ora file, I want to 
compare it with mine.


----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 16:10

Message:
Logged In: YES 
user_id=85088

I can rule out the simple matter of exceeding max sessions 
and I can also rule out JBoss leaking sessions.  Here's how I 
know.

I run with a pool size of 64 connections.  If I run with a client 
program executing 55 threads and monitor the number of 
sessions in SQL*Plus then as the test runs the number of 
sessions eventually reaches 56, that's 55 for JBoss and 1 for 
SQL*Plus.  I eventually get the exception.

Again running with a pool size of 64 connections.  If I run with 
a client program executing 50 threads and monitor the 
number of sessions in SQL*Plus then as the test runs the 
number of sessions eventualy reached 51, that's 50 for JBoss 
and 1 for SQL*Plus.  I also eventually get the exception in 
this case, but the number of sessions never exceeds 51.

So, I know my max sessions is at least 56 but I get the error 
with only 51 open.  So, JBoss is not leaking sessions, but I 
am not hitting max sessions either.  I can't rule out an Oracle 
bug where it thinks it hit max sessions but didn't, but I can't 
prove it either.

Any ideas?

Is there any other variable besides SESSIONS that may be 
involved?  Is there perhaps a limit on XA transactions or 
something like that?

I know you're not Oracle support, and I appreciate your help.  
I just want to completely rule out a problem on the JBoss end 
before going to Oracle and I also want to have the correct 
information to report to Oracle if I do need to go to them.

Thanks,
Matt Cleveland

----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 15:29

Message:
Logged In: YES 
user_id=85088

I will confirm, but I *THINK* I have enough sessions.  I have 
seen a different error in the past when running out of 
sessions.  I will do some testing today and see what I can 
find.

----------------------------------------------------------------------

Comment By: Igor Fedorenko (igorfie)
Date: 2003-01-28 15:17

Message:
Logged In: YES 
user_id=232950

Correction -- number of SCOTT's sessions before running the 
test is supposed to be zero. Sorry for the confusion ;-)

----------------------------------------------------------------------

Comment By: Igor Fedorenko (igorfie)
Date: 2003-01-28 14:45

Message:
Logged In: YES 
user_id=232950

Matt,

You are running out of oracle database connections. Here is 
what "oracle error: 18" means:

oahu:$ oerr ora 18
00018, 00000, "maximum number of sessions exceeded"
// *Cause:  All session state objects are in use.
// *Action: Increase the value of the SESSIONS initialization 
parameter.


If you believe that JBoss is leaking sessions please provide a 
test case that shows this problem. Possible test procedure 
would look like

1. Make sure nobody else is using the same database 
schema SCOTT
2. Using SQL*Plus connect to Oracle as SYSTEM and 
execute "select count(*) from v$session where 
username='SCOTT'" to get initial number of sessions (it's 
about 9 sessions in idle database)
3. Start JBoss, run your test
4. Using SQL*Plus connect to Oracle as SYSTEM and 
execute "select count(*) from v$session where 
username='SCOTT'" to get number of sessions after running 
the test

A bug exists if the difference between number of sessions 
before and after the test is greater then maximum number of 
connections in the pool.

Hope this helps.

On a related topic. It'd be nice to allow pluggable 
XAException formatters (for vendor specific error messages).

----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 00:12

Message:
Logged In: YES 
user_id=85088

Per Igor Federenko, I tested with a special jboss-
transaction.jar that would output Oracle specific debug 
messages.  When re-running my test I received the following 
error report in the server log file.

2003-01-27 23:44:59,030 WARN  
[org.jboss.tm.TransactionImpl] xa error: -3 (A res
ource manager error has occured in the transaction branch.); 
oracle error: 18; o
racle sql error: 0;
oracle.jdbc.xa.OracleXAException
        at oracle.jdbc.xa.OracleXAResource.checkError
(OracleXAResource.java:1157
)
        at oracle.jdbc.xa.client.OracleXAResource.commit
(OracleXAResource.java:5
90)
        at 
org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.co
mmit(XAManag
edConnection.java:140)
        at org.jboss.tm.TransactionImpl.commitResources
(TransactionImpl.java:147
3)
        at org.jboss.tm.TransactionImpl.commit
(TransactionImpl.java:352)
        at 
org.jboss.ejb.plugins.TxInterceptorCMT.endTransaction
(TxInterceptorCM
T.java:361)
        at 
org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions
(TxIntercep
torCMT.java:247)
        at org.jboss.ejb.plugins.TxInterceptorCMT.invoke
(TxInterceptorCMT.java:1
01)
        at org.jboss.ejb.plugins.SecurityInterceptor.invoke
(SecurityInterceptor.
java:130)
        at org.jboss.ejb.plugins.LogInterceptor.invoke
(LogInterceptor.java:204)
        at 
org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke
(CleanShutdownIn
terceptor.java:265)
        at 
org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke
(ProxyFacto
ryFinderInterceptor.java:154)
        at org.jboss.ejb.StatelessSessionContainer.invoke
(StatelessSessionContai
ner.java:303)
        at org.jboss.ejb.Container.invoke(Container.java:680)
        at org.jboss.mx.server.MBeanServerImpl.invoke
(MBeanServerImpl.java:549)
        at 
org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke
(JRMPInvokerHA.j
ava:163)
        at java.lang.reflect.Method.invoke(Native Method)
        at sun.rmi.server.UnicastServerRef.dispatch
(UnicastServerRef.java:236)
        at sun.rmi.transport.Transport$1.run(Transport.java:147)
        at java.security.AccessController.doPrivileged(Native 
Method)
        at sun.rmi.transport.Transport.serviceCall
(Transport.java:143)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages
(TCPTransport.java:4
60)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
(TCPTransport
.java:701)
        at java.lang.Thread.run(Thread.java:479)
2003-01-27 23:44:59,035 WARN  
[org.jboss.tm.TransactionImpl] XAException: tx=Tra
nsactionImpl:XidImpl [FormatId=257, 
GlobalId=redhook.synxis.com//511, BranchQual
=] errorCode=XAER_RMERR
oracle.jdbc.xa.OracleXAException
        at oracle.jdbc.xa.OracleXAResource.checkError
(OracleXAResource.java:1157
)
        at oracle.jdbc.xa.client.OracleXAResource.commit
(OracleXAResource.java:5
90)
        at 
org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.co
mmit(XAManag
edConnection.java:140)
        at org.jboss.tm.TransactionImpl.commitResources
(TransactionImpl.java:147
3)
        at org.jboss.tm.TransactionImpl.commit
(TransactionImpl.java:352)
        at 
org.jboss.ejb.plugins.TxInterceptorCMT.endTransaction
(TxInterceptorCM
T.java:361)
        at 
org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions
(TxIntercep
torCMT.java:247)
        at org.jboss.ejb.plugins.TxInterceptorCMT.invoke
(TxInterceptorCMT.java:1
01)
        at org.jboss.ejb.plugins.SecurityInterceptor.invoke
(SecurityInterceptor.
java:130)
        at org.jboss.ejb.plugins.LogInterceptor.invoke
(LogInterceptor.java:204)
        at 
org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke
(CleanShutdownIn
terceptor.java:265)
        at 
org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke
(ProxyFacto
ryFinderInterceptor.java:154)
        at org.jboss.ejb.StatelessSessionContainer.invoke
(StatelessSessionContai
ner.java:303)
        at org.jboss.ejb.Container.invoke(Container.java:680)
        at org.jboss.mx.server.MBeanServerImpl.invoke
(MBeanServerImpl.java:549)
        at 
org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke
(JRMPInvokerHA.java:163)
        at java.lang.reflect.Method.invoke(Native Method)
        at sun.rmi.server.UnicastServerRef.dispatch
(UnicastServerRef.java:236)
        at sun.rmi.transport.Transport$1.run(Transport.java:147)
        at java.security.AccessController.doPrivileged(Native 
Method)
        at sun.rmi.transport.Transport.serviceCall
(Transport.java:143)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages
(TCPTransport.java:4
60)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
(TCPTransport
.java:701)
        at java.lang.Thread.run(Thread.java:479)


----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-27 16:35

Message:
Logged In: YES 
user_id=85088

Oops, I spoke too soon.  The error is NOT being propagated 
to the client using the latest CVS for 3.2.  Looks like we 
need to look further.  In my test I received multiple RMERR 
exceptions in the server log file, but none were reported to 
the test client.

----------------------------------------------------------------------

Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-27 16:25

Message:
Logged In: YES 
user_id=85088

The problem with the error being propagated to the client is 
fixed.

I'm not convinced of your answer regarding the RMERR.  
First of all, the in-doubt tx error comes after the RMERR, 
which makes sense.  If JBoss failed to commit or rollback 
any transaction for any reason then it would become in-doubt 
because Oracle would not know whether it should be 
committed or rolled back, right?  Second, this RMERR 
exception looks very much like the type of exceptions you 
will get using JBoss with Oracle if you turn off 
TrackConnectionByTX or do not use the 
XAOracleManagedConnectionFactory.  Now, I'm not saying 
it's not an Oracle oddity or a behavior that differs from other 
XA drivers, but those are the kinds of things that 
TrackConnectionByTX and 
XAOracleManagedConnectionFactory are designed to fix.  I'm 
hoping someone can do the same with this one or at least 
rule out the possibility of doing the same.

Just to keep this bug report up to date with some activity in 
the dev list, here are the details of how to reproduce the bug.

> Ok, it took a while, but I can confirm that your test 
produces the error on
> JBoss 3.2 from CVS with clustering turned off.  Two things 
you might be
> missing are 1) increasing the thread count in the client to 
100 makes it
> more likely to happen more quickly and 2) the test client 
does not receive
> the error.  The error ONLY shows up in the server log file 
(and stdout).
> 
> We are using Oracle 9.2.0.1.0.  The JDBC driver version is 
9.2.0.0.0 as
> reported in the manifest.
> 
> Just to make sure I'm not missing something here are all 
the boring details
> of what I did.
> 
> 1. Got the latest from CVS
> 2. ./build.sh clobber
> 3. built JBoss with integrated Tomcat 4.1.18
> 4. Tweaked TestBean as follows to make it work in my 
build environment.
> None of these changes should matter to the test.
>       - changed bean name from test/Test to Test
>       - changed the view-type to remote because our build 
doesn't do
> <localinterface> for xdoclet
>       - changed the data source name.  Yours was 
XAOracleDS and mine is
> XaOracleDS
>       - changed the name of your remote interface to 
TestRemoteIF to match
> our naming conventions
> 5. made corresponding changes to TestMtClient and 
increased the number of
> threads to 100
> 6. built into an EAR
> 7. added my oracle-xa-ds.xml to the default configuration
> 8. turned on Pad in the XidFactory for the transaction 
manager in the
> default configuration
> 9. deployed my EAR to the default configuration
> 10. started the default configuration
> 11. ran TestMtClient long enough to get the error.  The 
error shows up in
> the server log file and stdout.


----------------------------------------------------------------------

Comment By: David Jencks (d_jencks)
Date: 2003-01-27 05:53

Message:
Logged In: YES 
user_id=60525

I've fixed the problem with no error showing up to the client in Branch_3_2 cvs.  
Please check that the error is being propagated as you expect to the client.

I think the original RMERR may well be an Oracle problem since the stack trace 
indicates that onephase commit is being called.  In this case any in-doubt transaction 
can be in doubt only because Oracle has lost track of its own internal state. (At 
least, since jboss is not calling prepare, I can't see how jboss has anything to do 
with an in-doubt tx).

You can check the error propagation with running this test:

cd testsuite
./build.sh one-test -Dtest=org.jboss.test.jca.test.XAExceptionUnitTestCase


Please report back your results, if satisfactory I will port to 3.0 and 4 if necessary.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=673249&group_id=22866


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Jboss-development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development

Reply via email to