On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler <glenn.s.fow...@gmail.com> wrote:
> On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <iszczesn...@gmail.com>
> wrote:
>>
>> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <glenn.s.fow...@gmail.com>
>> wrote:
>> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <lionelcons1...@gmail.com>
>> > wrote:
>> >>
>> >> On 1 December 2013 17:26, Glenn Fowler <glenn.s.fow...@gmail.com>
>> >> wrote:
>> >> > I believe this is related to vmalloc changes between 2013-05-31 and
>> >> > 2013-06-09
>> >> > re-run the tests with
>> >> > export VMALLOC_OPTIONS=getmem=safe
>> >> > if that's the problem then it gives a clue on a general solution
>> >> > details after confirmation
>> >> >
>> >>
>> >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
>> >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
>> >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
>> >> Version AIJMP 93v- 2013-10-08
>> >>
>> >> real          34.60
>> >> user          33.27
>> >> sys            1.19
>> >>
>> >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
>> >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
>> >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
>> >> "${.sh.version}" ; nanosort <xxx >yyy'
>> >> Version AIJMP 93v- 2013-10-08
>> >> real          15.34
>> >> user          14.67
>> >> sys            0.52
>> >>
>> >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
>> >> correct.
>> >>
>> >> What does VMALLOC_OPTIONS=getmem=safe do?
>> >
>> >
>> > vmalloc has an internal discipline/method for getting memory from the
>> > system
>> > several methods are available with varying degrees of thread safety etc.
>> > see src/lib/libast/vmalloc/vmdcsystem.c for the code
>> > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
>> > description (vmalloc.3 update shortly)
>> >
>> > **          getmemory=f enable f[,g] getmemory() functions if supported,
>> > all
>> > by default
>> > **                          anon: mmap(MAP_ANON)
>> > **                          break|sbrk: sbrk()
>> > **                          native: native malloc()
>> > **                          safe: safe sbrk() emulation via
>> > mmap(MAP_ANON)
>> > **                          zero: mmap(/dev/zero)
>> >
>> > i believe the performance regression with "anon" is that on linux
>> > mmap(0....MAP_ANON|MAP_PRIVATE...),
>> > which lets the system decide the address, returns adjacent (when
>> > possible)
>> > region addresses from highest to lowest order
>> > and the reverse order at minimum tends to fragment more memory
>> > "zero" has the same hi=>lo characteristic
>> > i suspect it adversely affects the vmalloc coalescing algorithm but have
>> > not
>> > dug deeper
>> > for now the probe order in vmalloc/vmdcsystem.c was simply changed to
>> > favor
>> > "safe"
>>
>> MAP_FIXED should be avoided because its only there for special
>> purposes like the runtime linker ld.so.1 or debuggers.
>>
>> Using this for a general-purpose memory allocator causes serious problems:
>> 1. On some systems this is a privileged operation and only available
>> for users with root privileges
>>
>> 2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the
>> performance from 9 seconds to almost 15 minutes because it utterly
>> destroys the systems concept of large pages. If two MAP_FIXED mappings
>> follow directly each other the system downgrades the page size to the
>> smallest possible size, even trying to break up larger pages, which in
>> turn must be done by a special deamon (vmtasks)
>>
>> 3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future
>> versions of Solaris
>>
>> 4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a
>> SEGV:
>> map(0xFFFFCD800B482000, 1048576, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFCD800B482000
>> sigaction(SIGSEGV, 0xFFFFFD7FFFDFDE50, 0xFFFFFD7FFFDFDED0) = 0
>>     Incurred fault #6, FLTBOUNDS  %pc = 0x0052FE06
>>       siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
>>     Received signal #11, SIGSEGV [caught]
>>       siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFCD800B582000
>> lwp_sigmask(SIG_SETMASK, 0x00000400, 0x00000000, 0x00000000,
>> 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
>
> edit src/lib/libast/vmalloc/vmmaddress.c and change
> #define VMCHKMEM        0
> this affects vmalloc detecting overbooked memory but will disable the
> MAP_FIXED codepath

Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...

... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris

2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means that it reserves this amount of memory
and the real allocation happens on the first read, write or execute
access of the matching MMU page. This also means there is no
performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
|mmap(MAP_ANON)| since it only reserves memory but does not
initalise/allocate it yet... this happens on the first time it's
accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
largepages, allowing a ksh process to benefit from such pages,
additionaly most AST (including ksh93) applications consume a few MB
of memory... so there is a good chance that the "typical"
application/shell memory consumtion completly fits into that 4MB
chunk. 64bit processes get four times as much memory since it's
expected that they may operate on much larger datasets (and see the
comment about fragmentation above)

Just to demonstrate "reservation" vs. "real usage" via Solaris pmap:
-- snip --
$ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
FFFFFD7FFDA00000      16384        148         20          - rw---    [ anon ]
-- snip --
The test shows that of 16384k only 148k have really been touched...
the difference (16384-148) is reserved by the shell process but not
used.

3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
describe whether the kernel permits overcommitment of memory or not.
AFAIK a simple function could be written which returns |-1| (not not
permit overcommitment), |0| (don't know) or |1| (does permit
overcommitment) ... and if the function returns |-1| vmalloc should do
the same as on Solaris

4. The patch removes one unneccesary |memset(p, 0, size)| which was
touching pages and therefore allocating them

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
diff -r -u build_i386_64bit_opt/src/lib/libast/vmalloc/vmhdr.h 
build_i386_64bit_debug/src/lib/libast/vmalloc/vmhdr.h
--- src/lib/libast/vmalloc/vmhdr.h      2013-08-27 18:44:46.000000000 +0200
+++ src/lib/libast/vmalloc/vmhdr.h      2013-12-09 22:14:12.731227511 +0100
@@ -182,9 +182,9 @@
 
 /* hint to regulate memory requests to discipline functions */
 #if _ast_sizeof_size_t > 4 /* the address space is greater than 32-bit */
-#define VM_INCREMENT   (1024*1024) /* lots of memory available here    */
+#define VM_INCREMENT   (16*1024*1024) /* lots of memory available here */
 #else
-#define VM_INCREMENT   (64*1024)  /* perhaps more limited memory       */
+#define VM_INCREMENT   (4*1024*1024)  /* perhaps more limited memory   */
 #endif
 
 #define VM_PAGESIZE    8192 /* default assumed page size */
diff -r -u build_i386_64bit_opt/src/lib/libast/vmalloc/vmmaddress.c 
build_i386_64bit_debug/src/lib/libast/vmalloc/vmmaddress.c
--- src/lib/libast/vmalloc/vmmaddress.c 2013-06-09 06:13:49.000000000 +0200
+++ src/lib/libast/vmalloc/vmmaddress.c 2013-12-09 22:19:47.122281075 +0100
@@ -42,8 +42,16 @@
 ** Written by Kiem-Phong Vo, phon...@gmail.com, 07/07/2012
 */
 
-/* see if a given range of address is available for mapping */
+/*
+ * see if a given range of address is available for mapping
+ * This is used for overcommit detection.
+ *
+ * Solaris (__SunOS) is explicily excluded since it does
+ * not allow overcommitment of memory by default
+ */
+#ifndef __SunOS
 #define VMCHKMEM       1 /* set this to zero if signal&sigsetjmp don't work */
+#endif
 
 #if VMCHKMEM
 
diff -r -u build_i386_64bit_opt/src/lib/libast/vmalloc/vmopen.c 
build_i386_64bit_debug/src/lib/libast/vmalloc/vmopen.c
--- src/lib/libast/vmalloc/vmopen.c     2013-09-04 07:15:04.000000000 +0200
+++ src/lib/libast/vmalloc/vmopen.c     2013-12-06 09:40:41.344273508 +0100
@@ -130,7 +130,9 @@
                                write(9, "vmalloc: panic: heap initialization 
error #4\n", 45);
                        return NIL(Vmalloc_t*);
                }
+#if 0
                memset(base, 0, size);
+#endif
 
                /* make sure memory is properly aligned */
                if((algn = (ssize_t)(VMLONG(base)%ALIGN)) == 0 )
_______________________________________________
ast-developers mailing list
ast-developers@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to