Hello Przemek --
Sorry for not getting back to you yesterday. I have better information
now.
First, my assertion that size of the array was causing the call stack to
overflow was incorrect. As a couple of people in the core team pointed
out to me, array data isn't stored on the stack. It is allocated from
the heap. So the reason for the behavior you observed is different and
a little more complicated than I described. Furthermore, tasks=fifo and
tasks=qthreads also differ in how they handle this program.
First, some background information: although the data for an array isn't
on the stack, the descriptor for the array is on the stack. This along
with other local variables make the stack frame for each invocation of
modules/standard/Sort.chpl:_MergeSort() somewhat larger than 1 KiB,
which is nothing special.
Also, as you can see from the code in Sort.chpl, MergeSort() creates 2
tasks at each level of recursion until there are <=16 elements left to
sort, at which point it switches to an insertion sort. This means 2
tasks for 32 elements, 6 for 64 elements, 14 for 128, or approximately
2**(LOG2(size)-3) or 2**17 tasks for an array with about 2**20 elements.
Let's take the tasks=qthreads behavior first. Here each task gets its
own stack, and the tasks are switchable on a limited number of so-called
"worker threads", which are pthreads. We do have to create all 2**17 or
so tasks, along with their stacks. Normally Qthreads puts a guard page
that cannot be referenced at the beginning and end of each stack, to
detect stack overflow and underflow. The problem you are seeing with
the "mprotect in ALLOC_STACK ..." messages has to do with these guard
pages. There seems to be some kind of limit on how many of them can be
created. I get the same message in our environment. But, although this
message is a nuisance it's not actually fatal. It's just a warning
saying that the guard page could not be created. The program still runs
to completion. I can get rid of these messages by disabling guard
pages, either by throwing --no-stack-checks when I compile or by setting
the environment variable QT_GUARD_PAGES=0 when I run.
The tasks=fifo behavior seems to work fine for me. In fifo tasking a
Chapel task is hosted by the same pthread throughout its existence, and
the task uses the stack belonging to the pthread. An implementation
detail in fifo tasking is that while the parent task of a parallel
construct is waiting for the children to complete, as an optimization
the pthread hosting that parent task helps out by trying to execute one
or more of the child tasks, in a nested fashion. This will drive up the
stack requirements when deep recursion is present, because a pthread can
end up effectively having lots of tasks nested on its stack. But for
your test case, the number of tasks created is quite large but the
recursion is not deep, so even when this occurs the stack requirements
are not large. In any case, if I build your test case with tasks=fifo,
it succeeds for me just fine. What was the error you ran into when you
tried fifo tasking?
So, contrary to my first response, the problem here doesn't have to do
with the stack size as such. I can run your test case with a variety of
stack sizes at or below 8 MiB with either tasks=qthreads or tasks=fifo.
I hope this helps explain what's going on -- please let me know if any
of this needs clarifying or you have other questions.
greg
On Mon, 21 Mar 2016, Przemek Leśniak wrote:
The story:
As chapel learning excercise I wanted to implement parallel MergeSort
which would be faster than library version (since the library version
doesn't use parallelised merge). I noticed that running the MergeSort on
'big enough' arrays (probably depends on architecture used, in my case
10^6 was enough), resulted in huge wall of text consisting of messages:
...
mprotect in ALLOC_STACK (2): Cannot allocate memory
mprotect in ALLOC_STACK (1): Cannot allocate memory
..
Here is example code to force this behaviour:
use Sort;
proc main()
{
var A : [1..1000000] int;
MergeSort(A);
}
I guess that this happens because there are too many pre-allocated threads
which take too much of space. Solutions which didn't work:
- Reducing size of stack using ulimit, or changing the value of
- Reducing size of stack by changing the value of
CHPL_RT_NUM_THREADS_PER_LOCALE didn;t help either.
- Playing with fifo, massivethreads, qthreads didn't also help (they are
probably mapped to pthreads anyway).
Workaround which helped:
- Controlling number of tasks explicitly in the code like
in:https://github.com/coodie/ChapelDataStructures/blob/master/MergeSort/MergeSort.ch
pl, by using atomic variable which counts number of tasks, and only
starting new task when the limit hasn't been reached yet
Note that code actually sorts the array, it's just very slow due to error
handling and switching between kernel space and user space. This probably
requires some investigation in how scheduling and tasking is done in
chapel.
Note that this is just example. Starting too many tasks (in numbers of
million) in any way also produces this error. By using standard library
MergeSort in this example wanted to show that there are some bugs in it
aswell.
This bug is hard to insert into test system because this is probably
platform dependent, that's why I'm sending e-mail.
Information which will probably be useful:
chpl --version
chpl Version 1.13.0.-999
Copyright (c) 2004-2016, Cray Inc. (See LICENSE file for more details)
$CHPL_HOME/util/printchplenv
machine info: Linux goovie-U36SG 3.13.0-37-generic #64-Ubuntu SMP Mon Sep
22 21:28:38 UTC 2014 x86_64
CHPL_HOME: /home/goovie/Dokumenty/Programowanie/chapel *
script location: /home/goovie/Dokumenty/Programowanie/chapel/util
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_MEM: jemalloc
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
CHPL_GMP: gmp
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none
gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Chapel-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-bugs