Also java -version has the problem. But it's for when accessing hbase and
also hbase shell.
Trying now with a wrapper over java file from jdk/bin to start it with
-Xmx512m. Seems to work now but will see the impact on other java processes

On Tue, Oct 13, 2015, 17:54 Suresh Subbiah <[email protected]>
wrote:

> Hi Radu,
>
> Is it possible to tell which process is unable to start java? Or is it that
> none of the java processes are starting, including datanodes and
> regionservers ?
>
> Thanks
> Suresh
>
> On Tue, Oct 13, 2015 at 4:27 AM, Radu Marias <[email protected]> wrote:
>
> > Managed to start trafodion with the latest daily build. But now I'm
> having
> > some OpenVZ container issues when a java process is started:
> > Error occurred during initialization of VM
> > Could not reserve enough space for object heap
> >
> > Tried with alias to java as alias java="java -Xms128m -Xmx512m" and also
> > with JAVA_TOOL_OPTIONS but the same. Searching now for other fixes.
> >
> > On Tue, Oct 13, 2015 at 12:02 AM Steve Varnau <[email protected]>
> > wrote:
> >
> > > In upcoming changes to Jenkins automation, I will add the daily build
> > > downloads link to the daily-build test result email that gets sent to
> > this
> > > list.
> > >
> > > --Steve
> > >
> > > -----Original Message-----
> > > From: Roberta Marton [mailto:[email protected]]
> > > Sent: Thursday, October 8, 2015 9:51 AM
> > > To: [email protected]
> > > Subject: RE: trafodion won't start core files are generated
> > >
> > > Is this something that should be added to the Apache Trafodion
> > > website/wiki?
> > >
> > >      Roberta
> > >
> > > -----Original Message-----
> > > From: Steve Varnau [mailto:[email protected]]
> > > Sent: Thursday, October 8, 2015 9:47 AM
> > > To: [email protected]
> > > Subject: RE: trafodion won't start core files are generated
> > >
> > > Daily builds for development/test are posted at
> > > http://traf-downloads.esgyn.com/
> > >
> > > --Steve
> > >
> > > -----Original Message-----
> > > From: Suresh Subbiah [mailto:[email protected]]
> > > Sent: Thursday, October 8, 2015 7:10 AM
> > > To: [email protected]
> > > Subject: Re: trafodion won't start core files are generated
> > >
> > > Hi,
> > >
> > > What is the suggested procedure to pick up a daily build?
> > >
> > > Thanks
> > > Suresh
> > >
> > > On Thu, Oct 8, 2015 at 1:02 AM, Prashanth Vasudev <
> > > [email protected]> wrote:
> > >
> > > > Memorymonitor.cpp fix is part of this
> > > > https://issues.apache.org/jira/browse/TRAFODION-1492
> > > > Please pick up latest daily build.
> > > >
> > > > Also max locked memory 64kb below appears very small.
> > > >
> > > > Regards,
> > > > Prashanth
> > > >
> > > > -----Original Message-----
> > > > From: Radu Marias [mailto:[email protected]]
> > > > Sent: Wednesday, October 7, 2015 8:45 AM
> > > > To: dev <[email protected]>
> > > > Subject: Re: trafodion won't start core files are generated
> > > >
> > > > Hi,
> > > >
> > > > I have these:
> > > >
> > > > # pwd
> > > > /dev/shm
> > > > # ls -la
> > > > total 4
> > > > drwxrwxrwx 2 root      root        60 Oct  6 21:07 .
> > > > drwxr-xr-x 9 root      root      2180 Oct  2 22:28 ..
> > > > -rw-r--r-- 1 trafodion trafodion   32 Oct  6 21:07
> > > > sem.monitor.sem.trafodion
> > > >
> > > > kernel.shmmax = 68719476736
> > > > kernel.shmall = 4294967296
> > > >
> > > > # ulimit -a
> > > > core file size          (blocks, -c) 0
> > > > data seg size           (kbytes, -d) unlimited
> > > > scheduling priority             (-e) 0
> > > > file size               (blocks, -f) unlimited
> > > > pending signals                 (-i) 1805076
> > > > max locked memory       (kbytes, -l) 64
> > > > max memory size         (kbytes, -m) unlimited
> > > > open files                      (-n) 65535
> > > > pipe size            (512 bytes, -p) 8
> > > > POSIX message queues     (bytes, -q) 819200
> > > > real-time priority              (-r) 0
> > > > stack size              (kbytes, -s) 10240
> > > > cpu time               (seconds, -t) unlimited
> > > > max user processes              (-u) 65535
> > > > virtual memory          (kbytes, -v) unlimited
> > > > file locks                      (-x) unlimited
> > > >
> > > > I would try to reinstall trafodion to see it something got corrupted
> > > > and maybe that would fix the issue but I know there was a crash on
> > > > sqstart and one of your guys fixed it and copied the lib file to our
> > > > cluster:
> > > >
> > > > This is a response from Narendra in a previous thread where the issue
> > > > was fixed to start the trafodion:
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > > *I updated the code: sql/cli/memmonitor.cpp, so that if
> > > > > /proc/meminfo does not have the ‘Committed_AS’ entry, it will
> ignore
> > > > > it. Built it and put the
> > > > > binary: libcli.so on the veracity box (in the
> > > > > $MY_SQROOT/export/lib64 directory – on all the nodes). Restarted
> the
> > > > > env and ‘sqlci’ worked fine.
> > > > > Was able to ‘initialize trafodion’ and create a table.*
> > > >
> > > >
> > > > There was another one similar which I see it's closed
> > > > https://issues.apache.org/jira/browse/TRAFODION-1492
> > > >
> > > > So the idea is are these fixes in the latest daily build and I can
> try
> > > > to reinstall? Or please send the changed files so I can override
> after
> > > > reinstall.
> > > >
> > > > On Wed, Oct 7, 2015 at 6:02 PM, Selva Govindarajan <
> > > > [email protected]> wrote:
> > > >
> > > > > You would want to retain the shared segment size across reboots.
> So,
> > > > > please check if the following settings are available in
> > > > > /etc/sysctl.conf
> > > > >
> > > > > # Controls the maximum shared segment size, in bytes kernel.shmmax
> =
> > > > > 134217728
> > > > >
> > > > > # Controls the maximum number of shared memory segments, in pages
> > > > > kernel.shmall = 4294967296
> > > > >
> > > > >
> > > > > shmmax needs to be at least 64 MB. By default, Trafodion RMS shared
> > > > > segment size is 64 MB. Trafodion RMS shared segment can be expanded
> > > > > to
> > > > > 128 MB. So, it is better to set shmmax to 128 mb, just in case we
> > > > > need to expand it later.
> > > > >
> > > > > Selva
> > > > >
> > > > > -----Original Message-----
> > > > > From: Prashanth Vasudev [mailto:[email protected]]
> > > > > Sent: Tuesday, October 6, 2015 2:19 PM
> > > > > To: [email protected]
> > > > > Subject: RE: trafodion won't start core files are generated
> > > > >
> > > > > Hi,
> > > > > From the stack trace below, it appears trafodion monitor is unable
> > > > > to create shared memory objects.
> > > > > Please makes sure ulimit settings on all nodes have high limits for
> > > > > max locked memory.
> > > > > Also make sure /dev/shm on all nodes have the correct write
> > > > > permissions to trafodion user id.
> > > > >
> > > > > Regards,
> > > > > Prashanth
> > > > >
> > > > > -----Original Message-----
> > > > > From: Radu Marias [mailto:[email protected]]
> > > > > Sent: Tuesday, October 6, 2015 9:21 AM
> > > > > To: dev <[email protected]>
> > > > > Subject: trafodion won't start core files are generated
> > > > >
> > > > > Hi,
> > > > >
> > > > > At some point a node from the 5 nodes cluster has stopped and we
> > > > > needed to restart it, After that I've restarted all the ambari and
> > > > > hdp services but trafodion fails to start.
> > > > >
> > > > > Bellow are some stack traces and details for files that I'm not
> > > > > getting any stack. Files are from node1 and node2 and were in Oct
> 2
> > > > > (when I think node
> > > > > 2 was down) and Oct  6 (when re rebooted the node and tried to
> start
> > > > > trafodion). Feel free to connect and debug the issue on our
> cluster,
> > > > > Amanda has the credentials.
> > > > >
> > > > > *FROM NODE1*
> > > > >
> > > > > Oct  2 22:27 core.39347
> > > > > core.39347: ELF 64-bit LSB core file x86-64, version 1 (SYSV),
> > > > > SVR4-style, from 'tm SQMON1.1 00000 00000 039347 $TM0
> > > > > 188.138.61.175:60186 00002 00000
> > > > > 00009 SPAR'
> > > > > gdb
> /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv
> > > > > core.39347
> > > > > no stack
> > > > >
> > > > > Oct  2 22:41 core.15144
> > > > > Program terminated with signal 6, Aborted.
> > > > > #0  0x00007f77bcbbb625 in ?? ()
> > > > > #1  0x00007f77bcbbce05 in ?? ()
> > > > > #2  0x0000000000000010 in ?? () at ../common/Collections.cpp:109
> > > > > #3  0x00007f77bee62130 in ?? ()
> > > > > #4  0x00007ffe8e796ec0 in ?? ()
> > > > > #5  0x00007f77bdeced00 in ?? ()
> > > > > #6  0x0000000000000004 in ?? () at ../common/Collections.cpp:109
> > > > > #7  0x0000000001b3a310 in ?? ()
> > > > > #8  0x0000000000000000 in ?? ()
> > > > >
> > > > > Oct  2 22:41 core.39240
> > > > > #0  0x00007f534d03c625 in raise () from /lib64/libc.so.6
> > > > > #1  0x00007f534d03de05 in abort () from /lib64/libc.so.6
> > > > > #2  0x00007f534d03574e in __assert_fail_base () from
> > > > > /lib64/libc.so.6
> > > > > #3  0x00007f534d035810 in __assert_fail () from /lib64/libc.so.6
> > > > > #4  0x000000000046e213 in CExtTmLeaderReq::performRequest
> > > > > (this=0x7f53340008c0) at reqtmleader.cxx:126
> > > > > #5  0x000000000045a64a in CReqWorker::reqWorkerThread (this=<value
> > > > > optimized
> > > > > out>) at reqworker.cxx:79
> > > > > #6  0x000000000045a86d in reqWorker (arg=0xc6f9a0) at
> > > > > reqworker.cxx:147
> > > > > #7  0x00007f534db45a51 in start_thread () from
> > > > > /lib64/libpthread.so.0
> > > > > #8  0x00007f534d0f29ad in clone () from /lib64/libc.so.6
> > > > >
> > > > > Oct  2 22:41 core.15309
> > > > > core.15309: ELF 64-bit LSB core file x86-64, version 1 (SYSV),
> > > > > SVR4-style, from 'tm SQMON1.1 00000 00000 015309 $TM0
> > > > > 188.138.61.175:60186 00002 00000
> > > > > 00134 SPAR'
> > > > > gdb
> /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv
> > > > > core.15309
> > > > > no stack
> > > > >
> > > > >
> > > > > *FROM NODE2*
> > > > >
> > > > > Oct  2 22:29 core.39491
> > > > > core.39491: ELF 64-bit LSB core file x86-64, version 1 (SYSV),
> > > > > SVR4-style, from 'tm SQMON1.1 00001 00001 039491 $TM1
> > > > > 188.138.61.177:38680 00002 00001
> > > > > 00003 SPAR'
> > > > > gdb
> /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv
> > > > > core.39491
> > > > > no stack
> > > > >
> > > > > Oct  6 15:23 core.1394
> > > > > Program terminated with signal 6, Aborted.
> > > > > #0  0x00007fb97acbf625 in raise () from /lib64/libc.so.6
> > > > > #1  0x00007fb97acc0e05 in abort () from /lib64/libc.so.6
> > > > > #2  0x000000000041d07d in CProcessContainer::CProcessContainer
> > > > > (this=0x2071880, nodeContainer=<value optimized out>) at
> > > > > process.cxx:3366
> > > > > #3  0x0000000000453f5c in CNode::CNode (this=0x2071880,
> > > > > name=0x204c448 "euve79672", pnid=0, rank=0) at pnode.cxx:153
> > > > > #4  0x00000000004558e0 in CNodeContainer::AddNodes (this=<value
> > > > > optimized
> > > > > out>) at pnode.cxx:1564
> > > > > #5  0x00000000004169a5 in CCluster::InitializeConfigCluster
> > > > > (this=0x20757b0) at cluster.cxx:2740
> > > > > #6  0x0000000000417645 in CCluster::CCluster (this=0x20757b0) at
> > > > > cluster.cxx:567
> > > > > #7  0x0000000000431e1a in CTmSync_Container::CTmSync_Container
> > > > > (this=0x20757b0) at tmsync.cxx:137
> > > > > #8  0x0000000000407bb6 in CMonitor::CMonitor (this=0x20757b0,
> > > > > procTermSig=9) at monitor.cxx:323
> > > > > #9  0x00000000004086ad in main (argc=2, argv=0x7fff8322e298) at
> > > > > monitor.cxx:1152
> > > > >
> > > > > Oct  6 15:43 core.17626
> > > > > Program terminated with signal 6, Aborted.
> > > > > #0  0x00007fcf11aea625 in raise () from /lib64/libc.so.6
> > > > > #1  0x00007fcf11aebe05 in abort () from /lib64/libc.so.6
> > > > > #2  0x000000000041d07d in CProcessContainer::CProcessContainer
> > > > > (this=0x1182890, nodeContainer=<value optimized out>) at
> > > > > process.cxx:3366
> > > > > #3  0x0000000000453f5c in CNode::CNode (this=0x1182890,
> > > > > name=0x115d458 "euve79672", pnid=0, rank=0) at pnode.cxx:153
> > > > > #4  0x00000000004558e0 in CNodeContainer::AddNodes (this=<value
> > > > > optimized
> > > > > out>) at pnode.cxx:1564
> > > > > #5  0x00000000004169a5 in CCluster::InitializeConfigCluster
> > > > > (this=0x11867c0) at cluster.cxx:2740
> > > > > #6  0x0000000000417645 in CCluster::CCluster (this=0x11867c0) at
> > > > > cluster.cxx:567
> > > > > #7  0x0000000000431e1a in CTmSync_Container::CTmSync_Container
> > > > > (this=0x11867c0) at tmsync.cxx:137
> > > > > #8  0x0000000000407bb6 in CMonitor::CMonitor (this=0x11867c0,
> > > > > procTermSig=9) at monitor.cxx:323
> > > > > #9  0x00000000004086ad in main (argc=2, argv=0x7ffcaca91f68) at
> > > > > monitor.cxx:1152
> > > > >
> > > > > --
> > > > > And in the end, it's not the years in your life that count. It's
> the
> > > > > life in your years.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > And in the end, it's not the years in your life that count. It's the
> > > > life in your years.
> > > >
> > >
> >
>

Reply via email to