Thank you. This is a very useful find. Could you please explain where
JAVA_TOOL_OPTIONS is set, will all userids see this variable ?
In general I was under the impression that processes started through
Trafodion monitor got their env settings from ms.env alone. For udrserv to
see this setting, we may need to add it to ms.env (copy to all nodes then)
and restart Trafdodion. Please do not do this till someone else confirms
though as I am not certain.

Thanks
Suresh


On Fri, Sep 18, 2015 at 12:09 PM, Radu Marias <[email protected]> wrote:

> ok, it seems that it didn't last long, I've got again the java heap issue:
>
> # java -version
> Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
>
> Still no crash in trafodion after those 30 minutes.
>
> On Fri, Sep 18, 2015 at 8:06 PM, Radu Marias <[email protected]> wrote:
>
> > I think I managed to fix the java -version heap issue with:
> > *export JAVA_TOOL_OPTIONS="-Xms128m -Xmx2048m"*
> > For processes that will specify explicit values for Xms and Xmx those
> will
> > override the JAVA_TOO_OPTIONS as described here
> >
> http://stackoverflow.com/questions/28327620/difference-between-java-options-java-tool-options-and-java-opts
> >
> > trafodion processes will take these values if they don't specify others,
> I
> > see lines like this when trafodion starts:
> > Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m
> >
> > But in hammerdb when creating stored procedures I get this error:
> > Error in Virtual User 1: Picked up JAVA_TOOL_OPTIONS: -Xms128m -Xmx2048m
> >
> > FIxed the hammerdb issue by manually creating the SPJ and indexes but at
> > first run with 5 user I've got the crash from TRAFODION-1492. After
> > restarted trafodion managed to run with 5 users for about 30 minutes.
> Will
> > do more tests on Monday.
> >
> > On Fri, Sep 18, 2015 at 6:05 PM, Suresh Subbiah <
> > [email protected]> wrote:
> >
> >> Thank you.
> >>
> >> Trafodion-1492 can be used to track both the esp crash and the udrserv
> >> crash. The fixes will be in different areas, but that does not matter.
> >>
> >> I don't know much about containers or openVZ. Maybe others will know.
> Hope
> >> to have a fix ready soon for the udrserv crash problem, in case
> container
> >> settings cannot be changed.
> >>
> >> The general idea suggested by Selva is that we introduce env variables
> >> with
> >> min and max jvm heap size settings for the udrserv process (just like we
> >> have today for executor processes). Udrserv does have the idea of
> reading
> >> from a configuration file, so we could use that approach if that is
> >> preferable. Either way there should be some way to start a udrserv
> process
> >> with a smaller heap soon.
> >>
> >> Thanks
> >> Suresh
> >>
> >>
> >>
> >> On Fri, Sep 18, 2015 at 8:17 AM, Radu Marias <[email protected]>
> >> wrote:
> >>
> >> > The nodes are in OpenVZ containers an I noticed this:
> >> >
> >> > # cat /proc/user_beancounters
> >> >             uid  resource                     held
> maxheld
> >> >          barrier                limit              failcnt
> >> > *            privvmpages               6202505              9436485
> >> >      9437184              9437184                 1573*
> >> >
> >> > I assume this could be related to java -version issue. Trying to see
> if
> >> I
> >> > can fix this, we are limited on what can be set from inside the
> >> container.
> >> >
> >> > # cat /proc/user_beancounters
> >> > Version: 2.5
> >> >        uid  resource                     held              maxheld
> >> >      barrier                limit              failcnt
> >> >  10045785:  kmemsize                111794747            999153664
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             lockedpages                  7970                 7970
> >> >      6291456              6291456                    0
> >> > *            privvmpages               6202505              9436485
> >> >      9437184              9437184                 1573*
> >> >             shmpages                    34617                36553
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             dummy                           0                    0
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             numproc                       952                 1299
> >> >        30000                30000                    0
> >> >             physpages                 1214672              6291456
> >> >      6291456              6291456                    0
> >> >             vmguarpages                     0                    0
> >> >      6291456              6291456                    0
> >> >             oomguarpages              1096587              2121834
> >> >      6291456              6291456                    0
> >> >             numtcpsock                    226                  457
> >> >        30000                30000                    0
> >> >             numflock                        5                   16
> >> >         1000                 1100                    0
> >> >             numpty                          4                    6
> >> >          512                  512                    0
> >> >             numsiginfo                      1                   69
> >> >         1024                 1024                    0
> >> >             tcpsndbuf                 5637456             17822864
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             tcprcvbuf                 6061504             13730792
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             othersockbuf                46240              1268016
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             dgramrcvbuf                     0               436104
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             numothersock                   89                  134
> >> >        30000                30000                    0
> >> >             dcachesize               61381173            935378121
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             numfile                      7852                11005
> >> >       250000               250000                    0
> >> >             dummy                           0                    0
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             dummy                           0                    0
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             dummy                           0                    0
> >> >  9223372036854775807  9223372036854775807                    0
> >> >             numiptent                      38                   38
> >> >         1000                 1000                    0
> >> >
> >> >
> >> > On Fri, Sep 18, 2015 at 3:34 PM, Radu Marias <[email protected]>
> >> wrote:
> >> >
> >> > > With 1 user no crash occurs, but on the node on which hammerdb is
> >> started
> >> > > I noticed from time to time this:
> >> > >
> >> > > $ java -version
> >> > > Error occurred during initialization of VM
> >> > > Unable to allocate 199232KB bitmaps for parallel garbage collection
> >> for
> >> > > the requested 6375424KB heap.
> >> > > Error: Could not create the Java Virtual Machine.
> >> > > Error: A fatal exception has occurred. Program will exit.
> >> > >
> >> > > $ free -h
> >> > >              total       used       free     shared    buffers
> >>  cached
> >> > > Mem:           24G       4.7G        19G       132M         0B
> >>  314M
> >> > > -/+ buffers/cache:       4.4G        19G
> >> > > Swap:           0B         0B         0B
> >> > >
> >> > >
> >> > > On Fri, Sep 18, 2015 at 2:21 PM, Radu Marias <[email protected]>
> >> > wrote:
> >> > >
> >> > >> $ $JAVA_HOME/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize
> >> > >>     uintx ErgoHeapSizeLimit                         = 0
> >> > >> {product}
> >> > >>     uintx HeapSizePerGCThread                       = 87241520
> >> > >>  {product}
> >> > >>     uintx InitialHeapSize                          := 402653184
> >> > >> {product}
> >> > >>     uintx LargePageHeapSizeThreshold                = 134217728
> >> > >> {product}
> >> > >>     uintx MaxHeapSize                              := 6442450944
> >> > >>  {product}
> >> > >> java version "1.7.0_67"
> >> > >> Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> >> > >> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
> >> > >>
> >> > >>
> >> > >> On Fri, Sep 18, 2015 at 2:20 PM, Radu Marias <[email protected]
> >
> >> > >> wrote:
> >> > >>
> >> > >>> I've logged this issue several days ago, is this ok?
> >> > >>> https://issues.apache.org/jira/browse/TRAFODION-1492
> >> > >>>
> >> > >>> Will try with one user and let you know.
> >> > >>>
> >> > >>> On Fri, Sep 18, 2015 at 7:05 AM, Suresh Subbiah <
> >> > >>> [email protected]> wrote:
> >> > >>>
> >> > >>>> Hi
> >> > >>>>
> >> > >>>> How many Virtual users are being used? If it is more than one
> >> could we
> >> > >>>> please try the case with 1 user first.
> >> > >>>>
> >> > >>>> When the crash happens next time could we please try
> >> > >>>> sqps | grep esp | wc -l
> >> > >>>>
> >> > >>>> If this number is large we know a lot of esp processes are being
> >> > started
> >> > >>>> which could consume memory.
> >> > >>>> If this is the case please insert this row into the defaults
> table
> >> > from
> >> > >>>> sqlci and the restart dcs (dcsstop followed by dcsstart)
> >> > >>>> insert into "_MD_".defaults values('ATTEMPT_ESP_PARALLELISM',
> >> 'OFF',
> >> > >>>> 'hammerdb testing') ;
> >> > >>>> exit ;
> >> > >>>>
> >> > >>>> I will work having the udr process create a JVM with a smaller
> >> initial
> >> > >>>> heap
> >> > >>>> size. If you have time and would like to do so a, JIRA you file
> >> will
> >> > be
> >> > >>>> helpful. Or I can file the JIRA and work on it. It will not take
> >> long
> >> > to
> >> > >>>> make this change.
> >> > >>>>
> >> > >>>> Thanks
> >> > >>>> Suresh
> >> > >>>>
> >> > >>>> PS I found this command from stackOverflow to determine the
> >> > >>>> initialHeapSize
> >> > >>>> we get by default in this env
> >> > >>>>
> >> > >>>> java -XX:+PrintFlagsFinal -version | grep HeapSize
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> >
> >>
> http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> On Thu, Sep 17, 2015 at 10:32 AM, Radu Marias <
> >> [email protected]>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>> > Did the steps mentioned above to ensure that the trafodion
> >> processes
> >> > >>>> are
> >> > >>>> > free of JAVA installation mixup.
> >> > >>>> > Also changed so that hdp, trafodion and hammerdb uses the same
> >> jdk
> >> > >>>> from
> >> > >>>> > */usr/jdk64/jdk1.7.0_67*
> >> > >>>> >
> >> > >>>> > # java -version
> >> > >>>> > java version "1.7.0_67"
> >> > >>>> > Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> >> > >>>> > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
> >> > >>>> >
> >> > >>>> > # echo $JAVA_HOME
> >> > >>>> > /usr/jdk64/jdk1.7.0_67
> >> > >>>> >
> >> > >>>> > But when running hammerdb I got again a crash on 2 nodes. I
> >> noticed
> >> > >>>> that
> >> > >>>> > before the crash for about one minute I'm getting errors for
> >> *java
> >> > >>>> > -version* and
> >> > >>>> > about 30 seconds after the crash the java -version worked
> again.
> >> So
> >> > >>>> these
> >> > >>>> > issues might be related. Didn't yet found the problem and how
> to
> >> fix
> >> > >>>> the
> >> > >>>> > java -version issue.
> >> > >>>> >
> >> > >>>> > # java -version
> >> > >>>> > Error occurred during initialization of VM
> >> > >>>> > Could not reserve enough space for object heap
> >> > >>>> > Error: Could not create the Java Virtual Machine.
> >> > >>>> > Error: A fatal exception has occurred. Program will exit
> >> > >>>> >
> >> > >>>> > # file core.5813
> >> > >>>> > core.5813: ELF 64-bit LSB core file x86-64, version 1 (SYSV),
> >> > >>>> SVR4-style,
> >> > >>>> > from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3
> >> > >>>> > 188.138.61.175:48357
> >> > >>>> > 00004 000'
> >> > >>>> >
> >> > >>>> > #0  0x00007f6920ba0625 in raise () from /lib64/libc.so.6
> >> > >>>> > #1  0x00007f6920ba1e05 in abort () from /lib64/libc.so.6
> >> > >>>> > #2  0x0000000000424369 in comTFDS (msg1=0x43c070 "Trafodion UDR
> >> > Server
> >> > >>>> > Internal Error", msg2=<value optimized out>,
> msg3=0x7fff119787f0
> >> > >>>> "Source
> >> > >>>> > file information unavailable",
> >> > >>>> >     msg4=0x7fff11977ff0 "User routine being processed :
> >> > >>>> > TRAFODION.TPCC.NEWORDER, Routine Type : Stored Procedure,
> >> Language
> >> > >>>> Type :
> >> > >>>> > JAVA, Error occurred outside the user routine code",
> >> msg5=0x43ddc3
> >> > "",
> >> > >>>> > dialOut=<value optimized out>, writeToSeaLog=1) at
> >> > >>>> > ../udrserv/UdrFFDC.cpp:191
> >> > >>>> > #3  0x00000000004245d7 in makeTFDSCall (msg=0x7f692324b310 "The
> >> Java
> >> > >>>> > virtual machine aborted", file=<value optimized out>,
> line=<value
> >> > >>>> optimized
> >> > >>>> > out>, dialOut=1, writeToSeaLog=1) at ../udrserv/UdrFFDC.cpp:219
> >> > >>>> > #4  0x00007f69232316b8 in LmJavaHooks::abortHookJVM () at
> >> > >>>> > ../langman/LmJavaHooks.cpp:54
> >> > >>>> > #5  0x00007f69229cbbc6 in ParallelScavengeHeap::initialize() ()
> >> from
> >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >> > >>>> > #6  0x00007f6922afedba in Universe::initialize_heap() () from
> >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >> > >>>> > #7  0x00007f6922afff89 in universe_init() () from
> >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >> > >>>> > #8  0x00007f692273d9f5 in init_globals() () from
> >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >> > >>>> > #9  0x00007f6922ae78ed in Threads::create_vm(JavaVMInitArgs*,
> >> bool*)
> >> > >>>> ()
> >> > >>>> > from /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >> > >>>> > #10 0x00007f69227c5a34 in JNI_CreateJavaVM () from
> >> > >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >> > >>>> > #11 0x00007f692322de51 in LmLanguageManagerJava::initialize
> >> > >>>> (this=<value
> >> > >>>> > optimized out>, result=<value optimized out>, maxLMJava=<value
> >> > >>>> optimized
> >> > >>>> > out>, userOptions=0x7f69239ba418, diagsArea=<value optimized
> >> out>)
> >> > at
> >> > >>>> > ../langman/LmLangManagerJava.cpp:379
> >> > >>>> > #12 0x00007f692322f564 in
> >> > LmLanguageManagerJava::LmLanguageManagerJava
> >> > >>>> > (this=0x7f69239bec38, result=@0x7fff1197e19c,
> >> commandLineMode=<value
> >> > >>>> > optimized out>, maxLMJava=1, userOptions=0x7f69239ba418,
> >> > >>>> > diagsArea=0x7f6923991780) at
> ../langman/LmLangManagerJava.cpp:155
> >> > >>>> > #13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM
> >> > >>>> > (this=0x7f69239ba040, result=@0x7fff1197e19c, diags=<value
> >> optimized
> >> > >>>> out>)
> >> > >>>> > at ../udrserv/udrglobals.cpp:322
> >> > >>>> > #14 0x0000000000427328 in processALoadMessage
> >> > (UdrGlob=0x7f69239ba040,
> >> > >>>> > msgStream=..., request=..., env=<value optimized out>) at
> >> > >>>> > ../udrserv/udrload.cpp:163
> >> > >>>> > #15 0x000000000042fbfd in processARequest
> >> (UdrGlob=0x7f69239ba040,
> >> > >>>> > msgStream=..., env=...) at ../udrserv/udrserv.cpp:660
> >> > >>>> > #16 0x000000000043269c in runServer (argc=2,
> >> argv=0x7fff1197e528) at
> >> > >>>> > ../udrserv/udrserv.cpp:520
> >> > >>>> > #17 0x000000000043294e in main (argc=2, argv=0x7fff1197e528) at
> >> > >>>> > ../udrserv/udrserv.cpp:356
> >> > >>>> >
> >> > >>>> > On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah <
> >> > >>>> > [email protected]>
> >> > >>>> > wrote:
> >> > >>>> >
> >> > >>>> > > Hi,
> >> > >>>> > >
> >> > >>>> > > I have added a wiki page that describes how to get a stack
> >> trace
> >> > >>>> from a
> >> > >>>> > > core file. The page could do with some improvements on
> finding
> >> the
> >> > >>>> core
> >> > >>>> > > file and maybe even doing more than getting thestack trace.
> For
> >> > now
> >> > >>>> it
> >> > >>>> > > should make our troubleshooting cycle faster if the stack
> >> trace is
> >> > >>>> > included
> >> > >>>> > > in the initial message itself.
> >> > >>>> > >
> >> > >>>> > >
> >> > >>>> >
> >> > >>>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file
> >> > >>>> > >
> >> > >>>> > > In this case, the last node does not seem to have gdb, so I
> >> could
> >> > >>>> not see
> >> > >>>> > > the trace there. I moved the core file to the first node but
> >> then
> >> > >>>> the
> >> > >>>> > trace
> >> > >>>> > > looks like this. I assume this is because I moved the core
> file
> >> > to a
> >> > >>>> > > different node. I think Selva's suggestion is good to try. We
> >> may
> >> > >>>> have
> >> > >>>> > had
> >> > >>>> > > a few  tdm_udrserv processes from before the time the java
> >> change
> >> > >>>> was
> >> > >>>> > made.
> >> > >>>> > >
> >> > >>>> > > $ gdb tdm_udrserv core.49256
> >> > >>>> > > #0  0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6
> >> > >>>> > > #1  0x8857780a58ff2155 in ?? ()
> >> > >>>> > > Cannot access memory at address 0x8857780a58ff2155
> >> > >>>> > >
> >> > >>>> > > The back trace we saw yesterday when a udrserv process exited
> >> when
> >> > >>>> JVM
> >> > >>>> > > could not be started is used in the wiki page instead of this
> >> one.
> >> > >>>> If you
> >> > >>>> > > have time a JIRA on this unexpected udrserv exit will also be
> >> > >>>> valuable
> >> > >>>> > for
> >> > >>>> > > the Trafodion team.
> >> > >>>> > >
> >> > >>>> > > Thanks
> >> > >>>> > > Suresh
> >> > >>>> > >
> >> > >>>> > > On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan <
> >> > >>>> > > [email protected]> wrote:
> >> > >>>> > >
> >> > >>>> > > > Thanks for creating the JIRA Trafodion-1492.  The error is
> >> > >>>> similar to
> >> > >>>> > > > scenario-2. The process tdm_udrserv dumped core. We will
> look
> >> > >>>> into the
> >> > >>>> > > core
> >> > >>>> > > > file. In the meantime, can you please do the following:
> >> > >>>> > > >
> >> > >>>> > > > Bring the Trafodion instance down
> >> > >>>> > > > echo $MY_SQROOT -- shows Trafodion installation directory
> >> > >>>> > > > Remove $MY_SQROOT/etc/ms.env from all nodes
> >> > >>>> > > >
> >> > >>>> > > >
> >> > >>>> > > > Start a New Terminal Session so that new Java settings are
> in
> >> > >>>> place
> >> > >>>> > > > Login as a Trafodion user
> >> > >>>> > > > cd <trafodion_installation_directory>
> >> > >>>> > > > . ./sqenv.sh  (skip this if it is done automatically upon
> >> logon)
> >> > >>>> > > > sqgen
> >> > >>>> > > >
> >> > >>>> > > > Exit and Start a New Terminal Session
> >> > >>>> > > > Restart the Trafodion instance and check if you are seeing
> >> the
> >> > >>>> issue
> >> > >>>> > with
> >> > >>>> > > > tdm_udrserv again. We wanted to ensure that the trafodion
> >> > >>>> processes are
> >> > >>>> > > > free
> >> > >>>> > > > of JAVA installation mixup in your earlier message. We
> >> suspect
> >> > >>>> that can
> >> > >>>> > > > cause tdm_udrserv process  to dump core.
> >> > >>>> > > >
> >> > >>>> > > >
> >> > >>>> > > > Selva
> >> > >>>> > > >
> >> > >>>> > > > -----Original Message-----
> >> > >>>> > > > From: Radu Marias [mailto:[email protected]]
> >> > >>>> > > > Sent: Wednesday, September 16, 2015 5:40 AM
> >> > >>>> > > > To: dev <[email protected]>
> >> > >>>> > > > Subject: Re: odbc and/or hammerdb logs
> >> > >>>> > > >
> >> > >>>> > > > I'm seeing this in hammerdb logs, I assume is due to the
> >> crash
> >> > >>>> and some
> >> > >>>> > > > processes are stopped:
> >> > >>>> > > >
> >> > >>>> > > > Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion
> >> > >>>> Database]
> >> > >>>> > SQL
> >> > >>>> > > > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error
> 201
> >> > >>>> while
> >> > >>>> > > > communicating with server process $Z010LPE:23. [2015-09-16
> >> > >>>> 12:35:33]
> >> > >>>> > > > [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:***
> >> > >>>> ERROR[8904]
> >> > >>>> > SQL
> >> > >>>> > > > did not receive a reply from MXUDR, possibly caused by
> >> internal
> >> > >>>> errors
> >> > >>>> > > when
> >> > >>>> > > > executing user-defined routines. [2015-09-16 12:35:33]
> >> > >>>> > > >
> >> > >>>> > > > $ sqcheck
> >> > >>>> > > > Checking if processes are up.
> >> > >>>> > > > Checking attempt: 1; user specified max: 2. Execution time
> in
> >> > >>>> seconds:
> >> > >>>> > 0.
> >> > >>>> > > >
> >> > >>>> > > > The SQ environment is up!
> >> > >>>> > > >
> >> > >>>> > > >
> >> > >>>> > > > Process         Configured      Actual      Down
> >> > >>>> > > > -------         ----------      ------      ----
> >> > >>>> > > > DTM             5               5
> >> > >>>> > > > RMS             10              10
> >> > >>>> > > > MXOSRVR         20              20
> >> > >>>> > > >
> >> > >>>> > > > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <
> >> > >>>> [email protected]>
> >> > >>>> > > wrote:
> >> > >>>> > > >
> >> > >>>> > > > > I've restarted hdp and trafodion and now I managed to
> >> create
> >> > the
> >> > >>>> > > > > schema and stored procedures from hammerdb. But I'm
> getting
> >> > >>>> fails and
> >> > >>>> > > > > dump core again by trafodion while running virtual users.
> >> For
> >> > >>>> some of
> >> > >>>> > > > > the users I sometimes see in hammerdb logs:
> >> > >>>> > > > > Vuser 5:Failed to execute payment
> >> > >>>> > > > > Vuser 5:Failed to execute stock level
> >> > >>>> > > > > Vuser 5:Failed to execute new order
> >> > >>>> > > > >
> >> > >>>> > > > > Core files are on out last node, feel free to examine
> them,
> >> > the
> >> > >>>> files
> >> > >>>> > > > > were dumped while getting hammerdb errors:
> >> > >>>> > > > >
> >> > >>>> > > > > *core.49256*
> >> > >>>> > > > >
> >> > >>>> > > > > *core.48633*
> >> > >>>> > > > >
> >> > >>>> > > > > *core.49290*
> >> > >>>> > > > >
> >> > >>>> > > > >
> >> > >>>> > > > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <
> >> > >>>> [email protected]>
> >> > >>>> > > > wrote:
> >> > >>>> > > > >
> >> > >>>> > > > >> *Scenario 1:*
> >> > >>>> > > > >>
> >> > >>>> > > > >> I've created this issue
> >> > >>>> > > > >> https://issues.apache.org/jira/browse/TRAFODION-1492
> >> > >>>> > > > >> I think another fix was made related to *Committed_AS*
> in
> >> > >>>> > > > >> *sql/cli/memmonitor.cpp*.
> >> > >>>> > > > >>
> >> > >>>> > > > >> This is a response from Narendra in a previous thread
> >> where
> >> > the
> >> > >>>> > issue
> >> > >>>> > > > >> was fixed to start the trafodion:
> >> > >>>> > > > >>
> >> > >>>> > > > >>
> >> > >>>> > > > >>>
> >> > >>>> > > > >>>
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> *I updated the code: sql/cli/memmonitor.cpp, so that if
> >> > >>>> > > > >>> /proc/meminfo does not have the ‘Committed_AS’ entry,
> it
> >> > will
> >> > >>>> > ignore
> >> > >>>> > > > >>> it. Built it and put the binary: libcli.so on the
> >> veracity
> >> > >>>> box (in
> >> > >>>> > > > >>> the $MY_SQROOT/export/lib64 directory – on all the
> >> nodes).
> >> > >>>> > Restarted
> >> > >>>> > > > the
> >> > >>>> > > > >>> env and ‘sqlci’ worked fine.
> >> > >>>> > > > >>> Was able to ‘initialize trafodion’ and create a table.*
> >> > >>>> > > > >>
> >> > >>>> > > > >>
> >> > >>>> > > > >> *Scenario 2:*
> >> > >>>> > > > >>
> >> > >>>> > > > >> The *java -version* problem I recall we had only on the
> >> other
> >> > >>>> > cluster
> >> > >>>> > > > >> with centos 7, I did't seen it on this one with centos
> >> 6.7.
> >> > >>>> But a
> >> > >>>> > > > >> change I made these days in the latter one is installing
> >> > >>>> oracle *jdk
> >> > >>>> > > > >> 1.7.0_79* as default one and is where *JAVA_HOME* points
> >> to.
> >> > >>>> Before
> >> > >>>> > > > >> that some nodes had *open-jdk* as default and others
> >> didn't
> >> > >>>> have one
> >> > >>>> > > > >> but just the one installed by path by *ambari* in
> >> > >>>> > > > >> */usr/jdk64/jdk1.7.0_67* but which was not linked to
> >> > JAVA_HOME
> >> > >>>> or
> >> > >>>> > > *java*
> >> > >>>> > > > >> command by *alternatives*.
> >> > >>>> > > > >>
> >> > >>>> > > > >> *Failures is HammerDB:*
> >> > >>>> > > > >>
> >> > >>>> > > > >> Attached is the *trafodion.dtm.**log* from a node on
> >> which I
> >> > >>>> see a
> >> > >>>> > > > >> lot of lines like these and I assume is the *transaction
> >> > >>>> conflict*
> >> > >>>> > > > >> that you mentioned, I see these line on 4 out of 5
> nodes:
> >> > >>>> > > > >>
> >> > >>>> > > > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient:
> >> useForgotten
> >> > >>>> is true
> >> > >>>> > > > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient:
> >> > forceForgotten
> >> > >>>> is
> >> > >>>> > > > >> false
> >> > >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog:
> >> > >>>> forceControlPoint is
> >> > >>>> > > > >> false
> >> > >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog:
> >> useAutoFlush is
> >> > >>>> false
> >> > >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog:
> >> ageCommitted is
> >> > >>>> false
> >> > >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog:
> >> > >>>> disableBlockCache is
> >> > >>>> > > > >> false
> >> > >>>> > > > >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint:
> >> > >>>> > > > >> disableBlockCache is false
> >> > >>>> > > > >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint:
> >> > >>>> > useAutoFlush
> >> > >>>> > > > >> is false
> >> > >>>> > > > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit
> >> > >>>> RET_HASCONFLICT
> >> > >>>> > > > >> prepareCommit, txid: 17179989222
> >> > >>>> > > > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit
> >> > >>>> RET_HASCONFLICT
> >> > >>>> > > > >> prepareCommit, txid: 17179989277
> >> > >>>> > > > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit
> >> > >>>> RET_HASCONFLICT
> >> > >>>> > > > >> prepareCommit, txid: 17179989309
> >> > >>>> > > > >>
> >> > >>>> > > > >> What *transaction conflict* means in this case?
> >> > >>>> > > > >>
> >> > >>>> > > > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan <
> >> > >>>> > > > >> [email protected]> wrote:
> >> > >>>> > > > >>
> >> > >>>> > > > >>> Hi Radu,
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> Thanks for using Trafodion. With the help from Suresh,
> we
> >> > >>>> looked at
> >> > >>>> > > > >>> the core files in your cluster. We believe that there
> are
> >> > two
> >> > >>>> > > > >>> scenarios that is causing the Trafodion processes to
> dump
> >> > >>>> core.
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> Scenario 1:
> >> > >>>> > > > >>> Core dumped by tdm_arkesp processes. Trafodion engine
> has
> >> > >>>> assumed
> >> > >>>> > > > >>> the entity /proc/meminfo/Committed_AS is available in
> all
> >> > >>>> flavors
> >> > >>>> > of
> >> > >>>> > > > >>> linux.  The absence of this entity is not handled
> >> correctly
> >> > >>>> by the
> >> > >>>> > > > >>> trafodion tdm_arkesp process and hence it dumped core.
> >> > Please
> >> > >>>> file
> >> > >>>> > a
> >> > >>>> > > > >>> JIRA using this link
> >> > >>>> > > > >>>
> >> > >>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
> and
> >> > >>>> > > > >>> choose "Apache Trafodion" as the project to report a
> bug
> >> > >>>> against.
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> Scenario 2:
> >> > >>>> > > > >>> Core dumped by tdm_udrserv processes. From our
> analysis,
> >> > this
> >> > >>>> > > > >>> problem happened when the process attempted to create
> the
> >> > JVM
> >> > >>>> > > > >>> instance programmatically. Few days earlier, we have
> >> > observed
> >> > >>>> > > > >>> similar issue in your cluster when java -version
> command
> >> was
> >> > >>>> > > > >>> attempted. But, java -version or $JAVA_HOME/bin/java
> >> > -version
> >> > >>>> works
> >> > >>>> > > > >>> fine now.
> >> > >>>> > > > >>> Was there any change made to the cluster recently to
> >> avoid
> >> > the
> >> > >>>> > > > >>> problem with java -version command?
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> You can please delete all the core files in sql/scripts
> >> > >>>> directory
> >> > >>>> > > > >>> and issue the command to invoke SPJ and check if it
> still
> >> > >>>> dumps
> >> > >>>> > > > >>> core. We can look at the core file if it happens again.
> >> Your
> >> > >>>> > > > >>> solution to the java -version command would be helpful.
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> For the failures with HammerDB, can you please send us
> >> the
> >> > >>>> exact
> >> > >>>> > > > >>> error message returned by the Trafodion engine to the
> >> > >>>> application.
> >> > >>>> > > > >>> This might help us to narrow down the cause. You can
> also
> >> > >>>> look at
> >> > >>>> > > > >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any
> >> > transaction
> >> > >>>> > > > >>> conflict is causing this error.
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> Selva
> >> > >>>> > > > >>> -----Original Message-----
> >> > >>>> > > > >>> From: Radu Marias [mailto:[email protected]]
> >> > >>>> > > > >>> Sent: Tuesday, September 15, 2015 9:09 AM
> >> > >>>> > > > >>> To: dev <[email protected]>
> >> > >>>> > > > >>> Subject: Re: odbc and/or hammerdb logs
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> Also noticed there are several core. files from today
> in
> >> > >>>> > > > >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*.
> If
> >> > >>>> needed
> >> > >>>> > > > >>> please provide a gmail address so I can share them via
> >> > gdrive.
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias <
> >> > >>>> [email protected]
> >> > >>>> > >
> >> > >>>> > > > >>> wrote:
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> > Hi,
> >> > >>>> > > > >>> >
> >> > >>>> > > > >>> > I'm running HammerDB over trafodion and when running
> >> > virtual
> >> > >>>> > users
> >> > >>>> > > > >>> > sometimes I get errors like this in hammerdb logs:
> >> > >>>> > > > >>> > *Vuser 1:Failed to execute payment*
> >> > >>>> > > > >>> >
> >> > >>>> > > > >>> > *Vuser 1:Failed to execute new order*
> >> > >>>> > > > >>> >
> >> > >>>> > > > >>> > I'm using unixODBC and I tried to add these line in
> >> > >>>> > > > >>> > */etc/odbc.ini* but the trace file is not created.
> >> > >>>> > > > >>> > *[ODBC]*
> >> > >>>> > > > >>> > *Trace = 1*
> >> > >>>> > > > >>> > *TraceFile = /var/log/odbc_tracefile.log*
> >> > >>>> > > > >>> >
> >> > >>>> > > > >>> > Also tried with *Trace = yes* and *Trace = on*, I've
> >> found
> >> > >>>> > > > >>> > multiple references for both.
> >> > >>>> > > > >>> >
> >> > >>>> > > > >>> > How can I see more logs to debug the issue? Can I
> >> enable
> >> > >>>> logs for
> >> > >>>> > > > >>> > all queries in trafodion?
> >> > >>>> > > > >>> >
> >> > >>>> > > > >>> > --
> >> > >>>> > > > >>> > And in the end, it's not the years in your life that
> >> > count.
> >> > >>>> It's
> >> > >>>> > > > >>> > the life in your years.
> >> > >>>> > > > >>> >
> >> > >>>> > > > >>>
> >> > >>>> > > > >>>
> >> > >>>> > > > >>>
> >> > >>>> > > > >>> --
> >> > >>>> > > > >>> And in the end, it's not the years in your life that
> >> count.
> >> > >>>> It's
> >> > >>>> > the
> >> > >>>> > > > >>> life in your years.
> >> > >>>> > > > >>>
> >> > >>>> > > > >>
> >> > >>>> > > > >>
> >> > >>>> > > > >>
> >> > >>>> > > > >> --
> >> > >>>> > > > >> And in the end, it's not the years in your life that
> >> count.
> >> > >>>> It's the
> >> > >>>> > > > life
> >> > >>>> > > > >> in your years.
> >> > >>>> > > > >>
> >> > >>>> > > > >
> >> > >>>> > > > >
> >> > >>>> > > > >
> >> > >>>> > > > > --
> >> > >>>> > > > > And in the end, it's not the years in your life that
> count.
> >> > >>>> It's the
> >> > >>>> > > life
> >> > >>>> > > > > in your years.
> >> > >>>> > > > >
> >> > >>>> > > >
> >> > >>>> > > >
> >> > >>>> > > >
> >> > >>>> > > > --
> >> > >>>> > > > And in the end, it's not the years in your life that count.
> >> It's
> >> > >>>> the
> >> > >>>> > life
> >> > >>>> > > > in your years.
> >> > >>>> > > >
> >> > >>>> > >
> >> > >>>> >
> >> > >>>> >
> >> > >>>> >
> >> > >>>> > --
> >> > >>>> > And in the end, it's not the years in your life that count.
> It's
> >> the
> >> > >>>> life
> >> > >>>> > in your years.
> >> > >>>> >
> >> > >>>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> --
> >> > >>> And in the end, it's not the years in your life that count. It's
> the
> >> > >>> life in your years.
> >> > >>>
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> And in the end, it's not the years in your life that count. It's
> the
> >> > life
> >> > >> in your years.
> >> > >>
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > And in the end, it's not the years in your life that count. It's the
> >> life
> >> > > in your years.
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > And in the end, it's not the years in your life that count. It's the
> >> life
> >> > in your years.
> >> >
> >>
> >
> >
> >
> > --
> > And in the end, it's not the years in your life that count. It's the life
> > in your years.
> >
>
>
>
> --
> And in the end, it's not the years in your life that count. It's the life
> in your years.
>

Reply via email to