Thanks for creating the JIRA Trafodion-1492. The error is similar to scenario-2. The process tdm_udrserv dumped core. We will look into the core file. In the meantime, can you please do the following:
Bring the Trafodion instance down echo $MY_SQROOT -- shows Trafodion installation directory Remove $MY_SQROOT/etc/ms.env from all nodes Start a New Terminal Session so that new Java settings are in place Login as a Trafodion user cd <trafodion_installation_directory> . ./sqenv.sh (skip this if it is done automatically upon logon) sqgen Exit and Start a New Terminal Session Restart the Trafodion instance and check if you are seeing the issue with tdm_udrserv again. We wanted to ensure that the trafodion processes are free of JAVA installation mixup in your earlier message. We suspect that can cause tdm_udrserv process to dump core. Selva -----Original Message----- From: Radu Marias [mailto:[email protected]] Sent: Wednesday, September 16, 2015 5:40 AM To: dev <[email protected]> Subject: Re: odbc and/or hammerdb logs I'm seeing this in hammerdb logs, I assume is due to the crash and some processes are stopped: Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error 201 while communicating with server process $Z010LPE:23. [2015-09-16 12:35:33] [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:*** ERROR[8904] SQL did not receive a reply from MXUDR, possibly caused by internal errors when executing user-defined routines. [2015-09-16 12:35:33] $ sqcheck Checking if processes are up. Checking attempt: 1; user specified max: 2. Execution time in seconds: 0. The SQ environment is up! Process Configured Actual Down ------- ---------- ------ ---- DTM 5 5 RMS 10 10 MXOSRVR 20 20 On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <[email protected]> wrote: > I've restarted hdp and trafodion and now I managed to create the > schema and stored procedures from hammerdb. But I'm getting fails and > dump core again by trafodion while running virtual users. For some of > the users I sometimes see in hammerdb logs: > Vuser 5:Failed to execute payment > Vuser 5:Failed to execute stock level > Vuser 5:Failed to execute new order > > Core files are on out last node, feel free to examine them, the files > were dumped while getting hammerdb errors: > > *core.49256* > > *core.48633* > > *core.49290* > > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <[email protected]> wrote: > >> *Scenario 1:* >> >> I've created this issue >> https://issues.apache.org/jira/browse/TRAFODION-1492 >> I think another fix was made related to *Committed_AS* in >> *sql/cli/memmonitor.cpp*. >> >> This is a response from Narendra in a previous thread where the issue >> was fixed to start the trafodion: >> >> >>> >>> >>> >>> *I updated the code: sql/cli/memmonitor.cpp, so that if >>> /proc/meminfo does not have the ‘Committed_AS’ entry, it will ignore >>> it. Built it and put the binary: libcli.so on the veracity box (in >>> the $MY_SQROOT/export/lib64 directory – on all the nodes). Restarted the >>> env and ‘sqlci’ worked fine. >>> Was able to ‘initialize trafodion’ and create a table.* >> >> >> *Scenario 2:* >> >> The *java -version* problem I recall we had only on the other cluster >> with centos 7, I did't seen it on this one with centos 6.7. But a >> change I made these days in the latter one is installing oracle *jdk >> 1.7.0_79* as default one and is where *JAVA_HOME* points to. Before >> that some nodes had *open-jdk* as default and others didn't have one >> but just the one installed by path by *ambari* in >> */usr/jdk64/jdk1.7.0_67* but which was not linked to JAVA_HOME or *java* >> command by *alternatives*. >> >> *Failures is HammerDB:* >> >> Attached is the *trafodion.dtm.**log* from a node on which I see a >> lot of lines like these and I assume is the *transaction conflict* >> that you mentioned, I see these line on 4 out of 5 nodes: >> >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: useForgotten is true >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: forceForgotten is >> false >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: forceControlPoint is >> false >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: useAutoFlush is false >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: ageCommitted is false >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: disableBlockCache is >> false >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint: >> disableBlockCache is false >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint: useAutoFlush >> is false >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT >> prepareCommit, txid: 17179989222 >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT >> prepareCommit, txid: 17179989277 >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT >> prepareCommit, txid: 17179989309 >> >> What *transaction conflict* means in this case? >> >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan < >> [email protected]> wrote: >> >>> Hi Radu, >>> >>> Thanks for using Trafodion. With the help from Suresh, we looked at >>> the core files in your cluster. We believe that there are two >>> scenarios that is causing the Trafodion processes to dump core. >>> >>> Scenario 1: >>> Core dumped by tdm_arkesp processes. Trafodion engine has assumed >>> the entity /proc/meminfo/Committed_AS is available in all flavors of >>> linux. The absence of this entity is not handled correctly by the >>> trafodion tdm_arkesp process and hence it dumped core. Please file a >>> JIRA using this link >>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa and >>> choose "Apache Trafodion" as the project to report a bug against. >>> >>> Scenario 2: >>> Core dumped by tdm_udrserv processes. From our analysis, this >>> problem happened when the process attempted to create the JVM >>> instance programmatically. Few days earlier, we have observed >>> similar issue in your cluster when java -version command was >>> attempted. But, java -version or $JAVA_HOME/bin/java -version works >>> fine now. >>> Was there any change made to the cluster recently to avoid the >>> problem with java -version command? >>> >>> You can please delete all the core files in sql/scripts directory >>> and issue the command to invoke SPJ and check if it still dumps >>> core. We can look at the core file if it happens again. Your >>> solution to the java -version command would be helpful. >>> >>> For the failures with HammerDB, can you please send us the exact >>> error message returned by the Trafodion engine to the application. >>> This might help us to narrow down the cause. You can also look at >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any transaction >>> conflict is causing this error. >>> >>> Selva >>> -----Original Message----- >>> From: Radu Marias [mailto:[email protected]] >>> Sent: Tuesday, September 15, 2015 9:09 AM >>> To: dev <[email protected]> >>> Subject: Re: odbc and/or hammerdb logs >>> >>> Also noticed there are several core. files from today in >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*. If needed >>> please provide a gmail address so I can share them via gdrive. >>> >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias <[email protected]> >>> wrote: >>> >>> > Hi, >>> > >>> > I'm running HammerDB over trafodion and when running virtual users >>> > sometimes I get errors like this in hammerdb logs: >>> > *Vuser 1:Failed to execute payment* >>> > >>> > *Vuser 1:Failed to execute new order* >>> > >>> > I'm using unixODBC and I tried to add these line in >>> > */etc/odbc.ini* but the trace file is not created. >>> > *[ODBC]* >>> > *Trace = 1* >>> > *TraceFile = /var/log/odbc_tracefile.log* >>> > >>> > Also tried with *Trace = yes* and *Trace = on*, I've found >>> > multiple references for both. >>> > >>> > How can I see more logs to debug the issue? Can I enable logs for >>> > all queries in trafodion? >>> > >>> > -- >>> > And in the end, it's not the years in your life that count. It's >>> > the life in your years. >>> > >>> >>> >>> >>> -- >>> And in the end, it's not the years in your life that count. It's the >>> life in your years. >>> >> >> >> >> -- >> And in the end, it's not the years in your life that count. It's the life >> in your years. >> > > > > -- > And in the end, it's not the years in your life that count. It's the life > in your years. > -- And in the end, it's not the years in your life that count. It's the life in your years.
