Re: overcommit_memory setting in cluster with hawq and hadoop deployed

Paul Guo Sun, 18 Dec 2016 22:48:06 -0800

Exactly. I've encountered or heard some scenarios hawq fails to stop or
recover after abnormal process exiting (usually during development or pl
functionality). HAWQ needs to be more robust (isolation, recovery, retry,
etc) against them.


2016-12-17 13:53 GMT+08:00 Lei Chang <[email protected]>:

> This issue has been raised many times. I think Taylor gave a good proposal.
>
> From long term, I think we should add more tests around killing process
> randomly.
>
> If it leads to corruptions, I think it is a bug. From database perspective,
> we should not assume that processes cannot be killed under some specific
> conditions or at some time.
>
> Thanks
> Lei
>
>
> On Sat, Dec 17, 2016 at 1:43 AM, Taylor Vesely <[email protected]> wrote:
>
> > Hi Ruilong,
> >
> > I've been brainstorming the issue, and this is my proposed solution.
> Please
> > tell me what you think.
> >
> > Segments are stateless. In Greenplum, are worried about catalog
> corruption
> > when a segment dies. In HAWQ, all of the data nodes are stateless. Even
> if
> > OOM killer ends up killing a segment, we shouldn't need to worry about
> > catalog corruption. *Only the master has a catalog that matters. *
> >
> > My proposition:
> >
> > Because the catalog matters on the master, we should probably continue to
> > run master nodes with vm.overcommit=2. On the segments, however, I think
> > that we shouldn't worry so much about an OOM event. The problem still
> > remains that all queries across the cluster will be canceled if a data
> node
> > goes offline (at least until HAWQ is able to restart failed query
> > executors).
> > If we *really* want to prevent the segments from being killed, we could
> > tell the kernel to prefer killing the other processes on the node via the
> > /proc/<pid>/oom_score_adj facility. Because Hadoop processes are
> generally
> > resilient enough to restart failed containers, most Java processes can be
> > treated as more expendable than HAWQ processes.
> >
> > /proc/<pid>/oom_score_odj ref:
> > https://www.kernel.org/doc/Documentation/filesystems/proc.txt
> >
> > Thanks,
> >
> > Taylor Vesely
> >
> > On Fri, Dec 16, 2016 at 7:01 AM, Ruilong Huo <[email protected]> wrote:
> >
> > > Hi HAWQ Community,
> > >
> > > overcommit_memory setting in linux control the behaviour of memory
> > > allocation. In cluster deployed with hawq and hadoop, it is
> controversial
> > > to set overcommit_memory for the nodes. To be specific, it is
> recommended
> > > to use overcommit strategy 2 by hawq, while it is recommended to use 1
> > or 0
> > > in hadoop.
> > >
> > > This thread is to start the discussion regarding the options to make a
> > > reasonable choice here so that it is good with both products.
> > >
> > > *1. From HAWQ perspective*
> > >
> > > It is recommended to use vm.overcommit_memory = 2 (other than 0 and 1)
> to
> > > prevent random kill of HAWQ process and thus backend reset.
> > >
> > > If nodes of the cluster are set to overcommit_memory  = 0 or 1, there
> is
> > > risk that running query might get terminated due to backend reset. Even
> > > worse, with overcommit_memory = 1, there is chance that data file and
> > > transaction log might get corrupted due to insufficient cleanup during
> > > process exit when oom happens. More details of overcommit_memory
> setting
> > in
> > > HAWQ can be found at: Linux-Overcommit-strategies-and-Pivotal-GPDB-HDB
> > > <https://discuss.pivotal.io/hc/en-us/articles/202703383-
> > > Linux-Overcommit-strategies-and-Pivotal-Greenplum-GPDB-
> Pivotal-HDB-HDB->
> > > .
> > >
> > > *2. From Hadoop perspective*
> > >
> > > The crash of datanode usually happens when there is not enough heap
> > memory
> > > for JVM. To be specific, JVM allocates more heap (via a malloc or mmap
> > > system call) and the address space has been exhausted. When
> > > overcommit_memory = 2 and we run out of available address space, the
> > system
> > > will return ENOMEM for the system call, and the JVM will crash.
> > >
> > > This is due to the fact is that Java is very address space greedy. It
> > will
> > > allocate large regions of address space that it isn't actually using.
> The
> > > overcommit_memory = 2 setting doesn't actually restrict physical memory
> > > use, it restricts address space use. Many applications (especially
> java)
> > > actually allocate sparse pages of memory, and rely on the kernel/OS to
> > > actually provide the memory as soon as a page fault occurs.
> > >
> > > Best regards,
> > > Ruilong Huo
> > >
> >
>

Re: overcommit_memory setting in cluster with hawq and hadoop deployed

Reply via email to