ok, it seems that it didn't last long, I've got again the java heap issue: # java -version Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m Error occurred during initialization of VM Could not reserve enough space for object heap Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.
Still no crash in trafodion after those 30 minutes. On Fri, Sep 18, 2015 at 8:06 PM, Radu Marias <[email protected]> wrote: > I think I managed to fix the java -version heap issue with: > *export JAVA_TOOL_OPTIONS="-Xms128m -Xmx2048m"* > For processes that will specify explicit values for Xms and Xmx those will > override the JAVA_TOO_OPTIONS as described here > http://stackoverflow.com/questions/28327620/difference-between-java-options-java-tool-options-and-java-opts > > trafodion processes will take these values if they don't specify others, I > see lines like this when trafodion starts: > Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m > > But in hammerdb when creating stored procedures I get this error: > Error in Virtual User 1: Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m > > FIxed the hammerdb issue by manually creating the SPJ and indexes but at > first run with 5 user I've got the crash from TRAFODION-1492. After > restarted trafodion managed to run with 5 users for about 30 minutes. Will > do more tests on Monday. > > On Fri, Sep 18, 2015 at 6:05 PM, Suresh Subbiah < > [email protected]> wrote: > >> Thank you. >> >> Trafodion-1492 can be used to track both the esp crash and the udrserv >> crash. The fixes will be in different areas, but that does not matter. >> >> I don't know much about containers or openVZ. Maybe others will know. Hope >> to have a fix ready soon for the udrserv crash problem, in case container >> settings cannot be changed. >> >> The general idea suggested by Selva is that we introduce env variables >> with >> min and max jvm heap size settings for the udrserv process (just like we >> have today for executor processes). Udrserv does have the idea of reading >> from a configuration file, so we could use that approach if that is >> preferable. Either way there should be some way to start a udrserv process >> with a smaller heap soon. >> >> Thanks >> Suresh >> >> >> >> On Fri, Sep 18, 2015 at 8:17 AM, Radu Marias <[email protected]> >> wrote: >> >> > The nodes are in OpenVZ containers an I noticed this: >> > >> > # cat /proc/user_beancounters >> > uid resource held maxheld >> > barrier limit failcnt >> > * privvmpages 6202505 9436485 >> > 9437184 9437184 1573* >> > >> > I assume this could be related to java -version issue. Trying to see if >> I >> > can fix this, we are limited on what can be set from inside the >> container. >> > >> > # cat /proc/user_beancounters >> > Version: 2.5 >> > uid resource held maxheld >> > barrier limit failcnt >> > 10045785: kmemsize 111794747 999153664 >> > 9223372036854775807 9223372036854775807 0 >> > lockedpages 7970 7970 >> > 6291456 6291456 0 >> > * privvmpages 6202505 9436485 >> > 9437184 9437184 1573* >> > shmpages 34617 36553 >> > 9223372036854775807 9223372036854775807 0 >> > dummy 0 0 >> > 9223372036854775807 9223372036854775807 0 >> > numproc 952 1299 >> > 30000 30000 0 >> > physpages 1214672 6291456 >> > 6291456 6291456 0 >> > vmguarpages 0 0 >> > 6291456 6291456 0 >> > oomguarpages 1096587 2121834 >> > 6291456 6291456 0 >> > numtcpsock 226 457 >> > 30000 30000 0 >> > numflock 5 16 >> > 1000 1100 0 >> > numpty 4 6 >> > 512 512 0 >> > numsiginfo 1 69 >> > 1024 1024 0 >> > tcpsndbuf 5637456 17822864 >> > 9223372036854775807 9223372036854775807 0 >> > tcprcvbuf 6061504 13730792 >> > 9223372036854775807 9223372036854775807 0 >> > othersockbuf 46240 1268016 >> > 9223372036854775807 9223372036854775807 0 >> > dgramrcvbuf 0 436104 >> > 9223372036854775807 9223372036854775807 0 >> > numothersock 89 134 >> > 30000 30000 0 >> > dcachesize 61381173 935378121 >> > 9223372036854775807 9223372036854775807 0 >> > numfile 7852 11005 >> > 250000 250000 0 >> > dummy 0 0 >> > 9223372036854775807 9223372036854775807 0 >> > dummy 0 0 >> > 9223372036854775807 9223372036854775807 0 >> > dummy 0 0 >> > 9223372036854775807 9223372036854775807 0 >> > numiptent 38 38 >> > 1000 1000 0 >> > >> > >> > On Fri, Sep 18, 2015 at 3:34 PM, Radu Marias <[email protected]> >> wrote: >> > >> > > With 1 user no crash occurs, but on the node on which hammerdb is >> started >> > > I noticed from time to time this: >> > > >> > > $ java -version >> > > Error occurred during initialization of VM >> > > Unable to allocate 199232KB bitmaps for parallel garbage collection >> for >> > > the requested 6375424KB heap. >> > > Error: Could not create the Java Virtual Machine. >> > > Error: A fatal exception has occurred. Program will exit. >> > > >> > > $ free -h >> > > total used free shared buffers >> cached >> > > Mem: 24G 4.7G 19G 132M 0B >> 314M >> > > -/+ buffers/cache: 4.4G 19G >> > > Swap: 0B 0B 0B >> > > >> > > >> > > On Fri, Sep 18, 2015 at 2:21 PM, Radu Marias <[email protected]> >> > wrote: >> > > >> > >> $ $JAVA_HOME/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize >> > >> uintx ErgoHeapSizeLimit = 0 >> > >> {product} >> > >> uintx HeapSizePerGCThread = 87241520 >> > >> {product} >> > >> uintx InitialHeapSize := 402653184 >> > >> {product} >> > >> uintx LargePageHeapSizeThreshold = 134217728 >> > >> {product} >> > >> uintx MaxHeapSize := 6442450944 >> > >> {product} >> > >> java version "1.7.0_67" >> > >> Java(TM) SE Runtime Environment (build 1.7.0_67-b01) >> > >> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >> > >> >> > >> >> > >> On Fri, Sep 18, 2015 at 2:20 PM, Radu Marias <[email protected]> >> > >> wrote: >> > >> >> > >>> I've logged this issue several days ago, is this ok? >> > >>> https://issues.apache.org/jira/browse/TRAFODION-1492 >> > >>> >> > >>> Will try with one user and let you know. >> > >>> >> > >>> On Fri, Sep 18, 2015 at 7:05 AM, Suresh Subbiah < >> > >>> [email protected]> wrote: >> > >>> >> > >>>> Hi >> > >>>> >> > >>>> How many Virtual users are being used? If it is more than one >> could we >> > >>>> please try the case with 1 user first. >> > >>>> >> > >>>> When the crash happens next time could we please try >> > >>>> sqps | grep esp | wc -l >> > >>>> >> > >>>> If this number is large we know a lot of esp processes are being >> > started >> > >>>> which could consume memory. >> > >>>> If this is the case please insert this row into the defaults table >> > from >> > >>>> sqlci and the restart dcs (dcsstop followed by dcsstart) >> > >>>> insert into "_MD_".defaults values('ATTEMPT_ESP_PARALLELISM', >> 'OFF', >> > >>>> 'hammerdb testing') ; >> > >>>> exit ; >> > >>>> >> > >>>> I will work having the udr process create a JVM with a smaller >> initial >> > >>>> heap >> > >>>> size. If you have time and would like to do so a, JIRA you file >> will >> > be >> > >>>> helpful. Or I can file the JIRA and work on it. It will not take >> long >> > to >> > >>>> make this change. >> > >>>> >> > >>>> Thanks >> > >>>> Suresh >> > >>>> >> > >>>> PS I found this command from stackOverflow to determine the >> > >>>> initialHeapSize >> > >>>> we get by default in this env >> > >>>> >> > >>>> java -XX:+PrintFlagsFinal -version | grep HeapSize >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >> http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined >> > >>>> >> > >>>> >> > >>>> >> > >>>> On Thu, Sep 17, 2015 at 10:32 AM, Radu Marias < >> [email protected]> >> > >>>> wrote: >> > >>>> >> > >>>> > Did the steps mentioned above to ensure that the trafodion >> processes >> > >>>> are >> > >>>> > free of JAVA installation mixup. >> > >>>> > Also changed so that hdp, trafodion and hammerdb uses the same >> jdk >> > >>>> from >> > >>>> > */usr/jdk64/jdk1.7.0_67* >> > >>>> > >> > >>>> > # java -version >> > >>>> > java version "1.7.0_67" >> > >>>> > Java(TM) SE Runtime Environment (build 1.7.0_67-b01) >> > >>>> > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >> > >>>> > >> > >>>> > # echo $JAVA_HOME >> > >>>> > /usr/jdk64/jdk1.7.0_67 >> > >>>> > >> > >>>> > But when running hammerdb I got again a crash on 2 nodes. I >> noticed >> > >>>> that >> > >>>> > before the crash for about one minute I'm getting errors for >> *java >> > >>>> > -version* and >> > >>>> > about 30 seconds after the crash the java -version worked again. >> So >> > >>>> these >> > >>>> > issues might be related. Didn't yet found the problem and how to >> fix >> > >>>> the >> > >>>> > java -version issue. >> > >>>> > >> > >>>> > # java -version >> > >>>> > Error occurred during initialization of VM >> > >>>> > Could not reserve enough space for object heap >> > >>>> > Error: Could not create the Java Virtual Machine. >> > >>>> > Error: A fatal exception has occurred. Program will exit >> > >>>> > >> > >>>> > # file core.5813 >> > >>>> > core.5813: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> > >>>> SVR4-style, >> > >>>> > from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3 >> > >>>> > 188.138.61.175:48357 >> > >>>> > 00004 000' >> > >>>> > >> > >>>> > #0 0x00007f6920ba0625 in raise () from /lib64/libc.so.6 >> > >>>> > #1 0x00007f6920ba1e05 in abort () from /lib64/libc.so.6 >> > >>>> > #2 0x0000000000424369 in comTFDS (msg1=0x43c070 "Trafodion UDR >> > Server >> > >>>> > Internal Error", msg2=<value optimized out>, msg3=0x7fff119787f0 >> > >>>> "Source >> > >>>> > file information unavailable", >> > >>>> > msg4=0x7fff11977ff0 "User routine being processed : >> > >>>> > TRAFODION.TPCC.NEWORDER, Routine Type : Stored Procedure, >> Language >> > >>>> Type : >> > >>>> > JAVA, Error occurred outside the user routine code", >> msg5=0x43ddc3 >> > "", >> > >>>> > dialOut=<value optimized out>, writeToSeaLog=1) at >> > >>>> > ../udrserv/UdrFFDC.cpp:191 >> > >>>> > #3 0x00000000004245d7 in makeTFDSCall (msg=0x7f692324b310 "The >> Java >> > >>>> > virtual machine aborted", file=<value optimized out>, line=<value >> > >>>> optimized >> > >>>> > out>, dialOut=1, writeToSeaLog=1) at ../udrserv/UdrFFDC.cpp:219 >> > >>>> > #4 0x00007f69232316b8 in LmJavaHooks::abortHookJVM () at >> > >>>> > ../langman/LmJavaHooks.cpp:54 >> > >>>> > #5 0x00007f69229cbbc6 in ParallelScavengeHeap::initialize() () >> from >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so >> > >>>> > #6 0x00007f6922afedba in Universe::initialize_heap() () from >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so >> > >>>> > #7 0x00007f6922afff89 in universe_init() () from >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so >> > >>>> > #8 0x00007f692273d9f5 in init_globals() () from >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so >> > >>>> > #9 0x00007f6922ae78ed in Threads::create_vm(JavaVMInitArgs*, >> bool*) >> > >>>> () >> > >>>> > from /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so >> > >>>> > #10 0x00007f69227c5a34 in JNI_CreateJavaVM () from >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so >> > >>>> > #11 0x00007f692322de51 in LmLanguageManagerJava::initialize >> > >>>> (this=<value >> > >>>> > optimized out>, result=<value optimized out>, maxLMJava=<value >> > >>>> optimized >> > >>>> > out>, userOptions=0x7f69239ba418, diagsArea=<value optimized >> out>) >> > at >> > >>>> > ../langman/LmLangManagerJava.cpp:379 >> > >>>> > #12 0x00007f692322f564 in >> > LmLanguageManagerJava::LmLanguageManagerJava >> > >>>> > (this=0x7f69239bec38, result=@0x7fff1197e19c, >> commandLineMode=<value >> > >>>> > optimized out>, maxLMJava=1, userOptions=0x7f69239ba418, >> > >>>> > diagsArea=0x7f6923991780) at ../langman/LmLangManagerJava.cpp:155 >> > >>>> > #13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM >> > >>>> > (this=0x7f69239ba040, result=@0x7fff1197e19c, diags=<value >> optimized >> > >>>> out>) >> > >>>> > at ../udrserv/udrglobals.cpp:322 >> > >>>> > #14 0x0000000000427328 in processALoadMessage >> > (UdrGlob=0x7f69239ba040, >> > >>>> > msgStream=..., request=..., env=<value optimized out>) at >> > >>>> > ../udrserv/udrload.cpp:163 >> > >>>> > #15 0x000000000042fbfd in processARequest >> (UdrGlob=0x7f69239ba040, >> > >>>> > msgStream=..., env=...) at ../udrserv/udrserv.cpp:660 >> > >>>> > #16 0x000000000043269c in runServer (argc=2, >> argv=0x7fff1197e528) at >> > >>>> > ../udrserv/udrserv.cpp:520 >> > >>>> > #17 0x000000000043294e in main (argc=2, argv=0x7fff1197e528) at >> > >>>> > ../udrserv/udrserv.cpp:356 >> > >>>> > >> > >>>> > On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah < >> > >>>> > [email protected]> >> > >>>> > wrote: >> > >>>> > >> > >>>> > > Hi, >> > >>>> > > >> > >>>> > > I have added a wiki page that describes how to get a stack >> trace >> > >>>> from a >> > >>>> > > core file. The page could do with some improvements on finding >> the >> > >>>> core >> > >>>> > > file and maybe even doing more than getting thestack trace. For >> > now >> > >>>> it >> > >>>> > > should make our troubleshooting cycle faster if the stack >> trace is >> > >>>> > included >> > >>>> > > in the initial message itself. >> > >>>> > > >> > >>>> > > >> > >>>> > >> > >>>> >> > >> https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file >> > >>>> > > >> > >>>> > > In this case, the last node does not seem to have gdb, so I >> could >> > >>>> not see >> > >>>> > > the trace there. I moved the core file to the first node but >> then >> > >>>> the >> > >>>> > trace >> > >>>> > > looks like this. I assume this is because I moved the core file >> > to a >> > >>>> > > different node. I think Selva's suggestion is good to try. We >> may >> > >>>> have >> > >>>> > had >> > >>>> > > a few tdm_udrserv processes from before the time the java >> change >> > >>>> was >> > >>>> > made. >> > >>>> > > >> > >>>> > > $ gdb tdm_udrserv core.49256 >> > >>>> > > #0 0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6 >> > >>>> > > #1 0x8857780a58ff2155 in ?? () >> > >>>> > > Cannot access memory at address 0x8857780a58ff2155 >> > >>>> > > >> > >>>> > > The back trace we saw yesterday when a udrserv process exited >> when >> > >>>> JVM >> > >>>> > > could not be started is used in the wiki page instead of this >> one. >> > >>>> If you >> > >>>> > > have time a JIRA on this unexpected udrserv exit will also be >> > >>>> valuable >> > >>>> > for >> > >>>> > > the Trafodion team. >> > >>>> > > >> > >>>> > > Thanks >> > >>>> > > Suresh >> > >>>> > > >> > >>>> > > On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan < >> > >>>> > > [email protected]> wrote: >> > >>>> > > >> > >>>> > > > Thanks for creating the JIRA Trafodion-1492. The error is >> > >>>> similar to >> > >>>> > > > scenario-2. The process tdm_udrserv dumped core. We will look >> > >>>> into the >> > >>>> > > core >> > >>>> > > > file. In the meantime, can you please do the following: >> > >>>> > > > >> > >>>> > > > Bring the Trafodion instance down >> > >>>> > > > echo $MY_SQROOT -- shows Trafodion installation directory >> > >>>> > > > Remove $MY_SQROOT/etc/ms.env from all nodes >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > Start a New Terminal Session so that new Java settings are in >> > >>>> place >> > >>>> > > > Login as a Trafodion user >> > >>>> > > > cd <trafodion_installation_directory> >> > >>>> > > > . ./sqenv.sh (skip this if it is done automatically upon >> logon) >> > >>>> > > > sqgen >> > >>>> > > > >> > >>>> > > > Exit and Start a New Terminal Session >> > >>>> > > > Restart the Trafodion instance and check if you are seeing >> the >> > >>>> issue >> > >>>> > with >> > >>>> > > > tdm_udrserv again. We wanted to ensure that the trafodion >> > >>>> processes are >> > >>>> > > > free >> > >>>> > > > of JAVA installation mixup in your earlier message. We >> suspect >> > >>>> that can >> > >>>> > > > cause tdm_udrserv process to dump core. >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > Selva >> > >>>> > > > >> > >>>> > > > -----Original Message----- >> > >>>> > > > From: Radu Marias [mailto:[email protected]] >> > >>>> > > > Sent: Wednesday, September 16, 2015 5:40 AM >> > >>>> > > > To: dev <[email protected]> >> > >>>> > > > Subject: Re: odbc and/or hammerdb logs >> > >>>> > > > >> > >>>> > > > I'm seeing this in hammerdb logs, I assume is due to the >> crash >> > >>>> and some >> > >>>> > > > processes are stopped: >> > >>>> > > > >> > >>>> > > > Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion >> > >>>> Database] >> > >>>> > SQL >> > >>>> > > > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error 201 >> > >>>> while >> > >>>> > > > communicating with server process $Z010LPE:23. [2015-09-16 >> > >>>> 12:35:33] >> > >>>> > > > [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:*** >> > >>>> ERROR[8904] >> > >>>> > SQL >> > >>>> > > > did not receive a reply from MXUDR, possibly caused by >> internal >> > >>>> errors >> > >>>> > > when >> > >>>> > > > executing user-defined routines. [2015-09-16 12:35:33] >> > >>>> > > > >> > >>>> > > > $ sqcheck >> > >>>> > > > Checking if processes are up. >> > >>>> > > > Checking attempt: 1; user specified max: 2. Execution time in >> > >>>> seconds: >> > >>>> > 0. >> > >>>> > > > >> > >>>> > > > The SQ environment is up! >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > Process Configured Actual Down >> > >>>> > > > ------- ---------- ------ ---- >> > >>>> > > > DTM 5 5 >> > >>>> > > > RMS 10 10 >> > >>>> > > > MXOSRVR 20 20 >> > >>>> > > > >> > >>>> > > > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias < >> > >>>> [email protected]> >> > >>>> > > wrote: >> > >>>> > > > >> > >>>> > > > > I've restarted hdp and trafodion and now I managed to >> create >> > the >> > >>>> > > > > schema and stored procedures from hammerdb. But I'm getting >> > >>>> fails and >> > >>>> > > > > dump core again by trafodion while running virtual users. >> For >> > >>>> some of >> > >>>> > > > > the users I sometimes see in hammerdb logs: >> > >>>> > > > > Vuser 5:Failed to execute payment >> > >>>> > > > > Vuser 5:Failed to execute stock level >> > >>>> > > > > Vuser 5:Failed to execute new order >> > >>>> > > > > >> > >>>> > > > > Core files are on out last node, feel free to examine them, >> > the >> > >>>> files >> > >>>> > > > > were dumped while getting hammerdb errors: >> > >>>> > > > > >> > >>>> > > > > *core.49256* >> > >>>> > > > > >> > >>>> > > > > *core.48633* >> > >>>> > > > > >> > >>>> > > > > *core.49290* >> > >>>> > > > > >> > >>>> > > > > >> > >>>> > > > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias < >> > >>>> [email protected]> >> > >>>> > > > wrote: >> > >>>> > > > > >> > >>>> > > > >> *Scenario 1:* >> > >>>> > > > >> >> > >>>> > > > >> I've created this issue >> > >>>> > > > >> https://issues.apache.org/jira/browse/TRAFODION-1492 >> > >>>> > > > >> I think another fix was made related to *Committed_AS* in >> > >>>> > > > >> *sql/cli/memmonitor.cpp*. >> > >>>> > > > >> >> > >>>> > > > >> This is a response from Narendra in a previous thread >> where >> > the >> > >>>> > issue >> > >>>> > > > >> was fixed to start the trafodion: >> > >>>> > > > >> >> > >>>> > > > >> >> > >>>> > > > >>> >> > >>>> > > > >>> >> > >>>> > > > >>> >> > >>>> > > > >>> *I updated the code: sql/cli/memmonitor.cpp, so that if >> > >>>> > > > >>> /proc/meminfo does not have the ‘Committed_AS’ entry, it >> > will >> > >>>> > ignore >> > >>>> > > > >>> it. Built it and put the binary: libcli.so on the >> veracity >> > >>>> box (in >> > >>>> > > > >>> the $MY_SQROOT/export/lib64 directory – on all the >> nodes). >> > >>>> > Restarted >> > >>>> > > > the >> > >>>> > > > >>> env and ‘sqlci’ worked fine. >> > >>>> > > > >>> Was able to ‘initialize trafodion’ and create a table.* >> > >>>> > > > >> >> > >>>> > > > >> >> > >>>> > > > >> *Scenario 2:* >> > >>>> > > > >> >> > >>>> > > > >> The *java -version* problem I recall we had only on the >> other >> > >>>> > cluster >> > >>>> > > > >> with centos 7, I did't seen it on this one with centos >> 6.7. >> > >>>> But a >> > >>>> > > > >> change I made these days in the latter one is installing >> > >>>> oracle *jdk >> > >>>> > > > >> 1.7.0_79* as default one and is where *JAVA_HOME* points >> to. >> > >>>> Before >> > >>>> > > > >> that some nodes had *open-jdk* as default and others >> didn't >> > >>>> have one >> > >>>> > > > >> but just the one installed by path by *ambari* in >> > >>>> > > > >> */usr/jdk64/jdk1.7.0_67* but which was not linked to >> > JAVA_HOME >> > >>>> or >> > >>>> > > *java* >> > >>>> > > > >> command by *alternatives*. >> > >>>> > > > >> >> > >>>> > > > >> *Failures is HammerDB:* >> > >>>> > > > >> >> > >>>> > > > >> Attached is the *trafodion.dtm.**log* from a node on >> which I >> > >>>> see a >> > >>>> > > > >> lot of lines like these and I assume is the *transaction >> > >>>> conflict* >> > >>>> > > > >> that you mentioned, I see these line on 4 out of 5 nodes: >> > >>>> > > > >> >> > >>>> > > > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: >> useForgotten >> > >>>> is true >> > >>>> > > > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: >> > forceForgotten >> > >>>> is >> > >>>> > > > >> false >> > >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: >> > >>>> forceControlPoint is >> > >>>> > > > >> false >> > >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: >> useAutoFlush is >> > >>>> false >> > >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: >> ageCommitted is >> > >>>> false >> > >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: >> > >>>> disableBlockCache is >> > >>>> > > > >> false >> > >>>> > > > >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint: >> > >>>> > > > >> disableBlockCache is false >> > >>>> > > > >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint: >> > >>>> > useAutoFlush >> > >>>> > > > >> is false >> > >>>> > > > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit >> > >>>> RET_HASCONFLICT >> > >>>> > > > >> prepareCommit, txid: 17179989222 >> > >>>> > > > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit >> > >>>> RET_HASCONFLICT >> > >>>> > > > >> prepareCommit, txid: 17179989277 >> > >>>> > > > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit >> > >>>> RET_HASCONFLICT >> > >>>> > > > >> prepareCommit, txid: 17179989309 >> > >>>> > > > >> >> > >>>> > > > >> What *transaction conflict* means in this case? >> > >>>> > > > >> >> > >>>> > > > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan < >> > >>>> > > > >> [email protected]> wrote: >> > >>>> > > > >> >> > >>>> > > > >>> Hi Radu, >> > >>>> > > > >>> >> > >>>> > > > >>> Thanks for using Trafodion. With the help from Suresh, we >> > >>>> looked at >> > >>>> > > > >>> the core files in your cluster. We believe that there are >> > two >> > >>>> > > > >>> scenarios that is causing the Trafodion processes to dump >> > >>>> core. >> > >>>> > > > >>> >> > >>>> > > > >>> Scenario 1: >> > >>>> > > > >>> Core dumped by tdm_arkesp processes. Trafodion engine has >> > >>>> assumed >> > >>>> > > > >>> the entity /proc/meminfo/Committed_AS is available in all >> > >>>> flavors >> > >>>> > of >> > >>>> > > > >>> linux. The absence of this entity is not handled >> correctly >> > >>>> by the >> > >>>> > > > >>> trafodion tdm_arkesp process and hence it dumped core. >> > Please >> > >>>> file >> > >>>> > a >> > >>>> > > > >>> JIRA using this link >> > >>>> > > > >>> >> > >>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa and >> > >>>> > > > >>> choose "Apache Trafodion" as the project to report a bug >> > >>>> against. >> > >>>> > > > >>> >> > >>>> > > > >>> Scenario 2: >> > >>>> > > > >>> Core dumped by tdm_udrserv processes. From our analysis, >> > this >> > >>>> > > > >>> problem happened when the process attempted to create the >> > JVM >> > >>>> > > > >>> instance programmatically. Few days earlier, we have >> > observed >> > >>>> > > > >>> similar issue in your cluster when java -version command >> was >> > >>>> > > > >>> attempted. But, java -version or $JAVA_HOME/bin/java >> > -version >> > >>>> works >> > >>>> > > > >>> fine now. >> > >>>> > > > >>> Was there any change made to the cluster recently to >> avoid >> > the >> > >>>> > > > >>> problem with java -version command? >> > >>>> > > > >>> >> > >>>> > > > >>> You can please delete all the core files in sql/scripts >> > >>>> directory >> > >>>> > > > >>> and issue the command to invoke SPJ and check if it still >> > >>>> dumps >> > >>>> > > > >>> core. We can look at the core file if it happens again. >> Your >> > >>>> > > > >>> solution to the java -version command would be helpful. >> > >>>> > > > >>> >> > >>>> > > > >>> For the failures with HammerDB, can you please send us >> the >> > >>>> exact >> > >>>> > > > >>> error message returned by the Trafodion engine to the >> > >>>> application. >> > >>>> > > > >>> This might help us to narrow down the cause. You can also >> > >>>> look at >> > >>>> > > > >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any >> > transaction >> > >>>> > > > >>> conflict is causing this error. >> > >>>> > > > >>> >> > >>>> > > > >>> Selva >> > >>>> > > > >>> -----Original Message----- >> > >>>> > > > >>> From: Radu Marias [mailto:[email protected]] >> > >>>> > > > >>> Sent: Tuesday, September 15, 2015 9:09 AM >> > >>>> > > > >>> To: dev <[email protected]> >> > >>>> > > > >>> Subject: Re: odbc and/or hammerdb logs >> > >>>> > > > >>> >> > >>>> > > > >>> Also noticed there are several core. files from today in >> > >>>> > > > >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*. If >> > >>>> needed >> > >>>> > > > >>> please provide a gmail address so I can share them via >> > gdrive. >> > >>>> > > > >>> >> > >>>> > > > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias < >> > >>>> [email protected] >> > >>>> > > >> > >>>> > > > >>> wrote: >> > >>>> > > > >>> >> > >>>> > > > >>> > Hi, >> > >>>> > > > >>> > >> > >>>> > > > >>> > I'm running HammerDB over trafodion and when running >> > virtual >> > >>>> > users >> > >>>> > > > >>> > sometimes I get errors like this in hammerdb logs: >> > >>>> > > > >>> > *Vuser 1:Failed to execute payment* >> > >>>> > > > >>> > >> > >>>> > > > >>> > *Vuser 1:Failed to execute new order* >> > >>>> > > > >>> > >> > >>>> > > > >>> > I'm using unixODBC and I tried to add these line in >> > >>>> > > > >>> > */etc/odbc.ini* but the trace file is not created. >> > >>>> > > > >>> > *[ODBC]* >> > >>>> > > > >>> > *Trace = 1* >> > >>>> > > > >>> > *TraceFile = /var/log/odbc_tracefile.log* >> > >>>> > > > >>> > >> > >>>> > > > >>> > Also tried with *Trace = yes* and *Trace = on*, I've >> found >> > >>>> > > > >>> > multiple references for both. >> > >>>> > > > >>> > >> > >>>> > > > >>> > How can I see more logs to debug the issue? Can I >> enable >> > >>>> logs for >> > >>>> > > > >>> > all queries in trafodion? >> > >>>> > > > >>> > >> > >>>> > > > >>> > -- >> > >>>> > > > >>> > And in the end, it's not the years in your life that >> > count. >> > >>>> It's >> > >>>> > > > >>> > the life in your years. >> > >>>> > > > >>> > >> > >>>> > > > >>> >> > >>>> > > > >>> >> > >>>> > > > >>> >> > >>>> > > > >>> -- >> > >>>> > > > >>> And in the end, it's not the years in your life that >> count. >> > >>>> It's >> > >>>> > the >> > >>>> > > > >>> life in your years. >> > >>>> > > > >>> >> > >>>> > > > >> >> > >>>> > > > >> >> > >>>> > > > >> >> > >>>> > > > >> -- >> > >>>> > > > >> And in the end, it's not the years in your life that >> count. >> > >>>> It's the >> > >>>> > > > life >> > >>>> > > > >> in your years. >> > >>>> > > > >> >> > >>>> > > > > >> > >>>> > > > > >> > >>>> > > > > >> > >>>> > > > > -- >> > >>>> > > > > And in the end, it's not the years in your life that count. >> > >>>> It's the >> > >>>> > > life >> > >>>> > > > > in your years. >> > >>>> > > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > -- >> > >>>> > > > And in the end, it's not the years in your life that count. >> It's >> > >>>> the >> > >>>> > life >> > >>>> > > > in your years. >> > >>>> > > > >> > >>>> > > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > -- >> > >>>> > And in the end, it's not the years in your life that count. It's >> the >> > >>>> life >> > >>>> > in your years. >> > >>>> > >> > >>>> >> > >>> >> > >>> >> > >>> >> > >>> -- >> > >>> And in the end, it's not the years in your life that count. It's the >> > >>> life in your years. >> > >>> >> > >> >> > >> >> > >> >> > >> -- >> > >> And in the end, it's not the years in your life that count. It's the >> > life >> > >> in your years. >> > >> >> > > >> > > >> > > >> > > -- >> > > And in the end, it's not the years in your life that count. It's the >> life >> > > in your years. >> > > >> > >> > >> > >> > -- >> > And in the end, it's not the years in your life that count. It's the >> life >> > in your years. >> > >> > > > > -- > And in the end, it's not the years in your life that count. It's the life > in your years. > -- And in the end, it's not the years in your life that count. It's the life in your years.
