date:20120405

Re: [gsoc2012] Port NetBSD's UDF implementation

2012-04-05 Thread Pedro Giffuni


Hi YongCon;

The project would be very interesting for us. I am pretty sure you will
not have problems finding a mentor.

That said, let me point out an old thread:

http://lists.freebsd.org/pipermail/freebsd-stable/2008-May/042565.html

I think the biggest problem is that you will have to get acquainted
with FreeBSD's Virtual Memory which is different from NetBSD's.
In that same thread you will find some comments by Matt Dillon
(no idea how up to date those are).

It will not be an easy task but people find such challenges very
rewarding.

Pedro.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [gsoc2012] Port NetBSD's UDF implementation

2012-04-05 Thread Andriy Gapon

on 05/04/2012 09:07 Pedro Giffuni said the following:
 Hi YongCon;
 
 The project would be very interesting for us. I am pretty sure you will
 not have problems finding a mentor.
 
 That said, let me point out an old thread:
 
 http://lists.freebsd.org/pipermail/freebsd-stable/2008-May/042565.html
 
 I think the biggest problem is that you will have to get acquainted
 with FreeBSD's Virtual Memory which is different from NetBSD's.
 In that same thread you will find some comments by Matt Dillon
 (no idea how up to date those are).
 
 It will not be an easy task but people find such challenges very
 rewarding.

Yongcong,

please note that we have already got proposals from two other students for this
project.  I haven't expected this project to get so popular after sitting
unnoticed for a few years.  To increase your chances of getting accepted, it
might be a good idea to consider another project.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

opensslv.h SHLIB_VERSION_NUMBER

2012-04-05 Thread Andriy Gapon


I wonder who can review the following change and what good or bad can come from 
it?

Index: crypto/openssl/crypto/opensslv.h
===
--- crypto/openssl/crypto/opensslv.h(revision 233888)
+++ crypto/openssl/crypto/opensslv.h(working copy)
@@ -83,7 +83,7 @@
  * should only keep the versions that are binary compatible with the current.
  */
 #define SHLIB_VERSION_HISTORY 
-#define SHLIB_VERSION_NUMBER 0.9.8
+#define SHLIB_VERSION_NUMBER 6


 #endif /* HEADER_OPENSSLV_H */

Rationale for the change can be seen here:
http://article.gmane.org/gmane.comp.kde.freebsd/20645
TLDR: some software may depend on libssl.so.${SHLIB_VERSION_NUMBER } being 
correct.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is there any modern alternative to pstack?

2012-04-05 Thread Eitan Adler

On 4 April 2012 15:29, Julian Elischer jul...@freebsd.org wrote:
 but we do add patches to make things work on FreeBSD.

We add patches to make ports...
... work on FreeBSD
... conform to FreeBSD hier (to an extent)
... work with alternate compilers, PREFIX, etc.

We shouldn't add patches which continue development. In all cases
the goal should be to upstream the patch ASAP. If there is no active
upstream and the patch does more than the above that is a sign that
someone needs to be willing to become the upstream maintainer first.


-- 
Eitan Adler
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

bin/166660: new stdbuf utility

2012-04-05 Thread Jeremie Le Hen

Hi hackers,

I've posted a PR bin/10 this morning:
[patch] New util/shlib to change per-fd default stdio buffering mode

For some unknown reason, [libc] has been prepended to the subject, but
this patch __DOES NOT touch libc__ (except a one-line addition in a
manpage).

This is aboslutely non-intrusive and can be easily MFC'd to RELENG_9,
RELENG_8 and even RELENG_7 if it hasn't reach itf end-of-life.

In brief, this is a new tool that allow to control default fd stdio
buffering mode.  The feature exists in Linux and its command-line
interface is BSD-compatible, so I used the same name and the same
interface for obvious compatibility reasons.

As you can guess, I'm looking for someone willing to test (though I've
already tested it and the code is pretty straightforward) and commit it.

You will find additional information in the PR.

Thanks.
-- 
Jeremie Le Hen

Men are born free and equal.  Later on, they're on their own.
Jean Yanne
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is there any modern alternative to pstack?

2012-04-05 Thread John Baldwin

On Thursday, April 05, 2012 7:43:15 am Eitan Adler wrote:
 On 4 April 2012 15:29, Julian Elischer jul...@freebsd.org wrote:
  but we do add patches to make things work on FreeBSD.
 
 We add patches to make ports...
 ... work on FreeBSD
 ... conform to FreeBSD hier (to an extent)
 ... work with alternate compilers, PREFIX, etc.
 
 We shouldn't add patches which continue development. In all cases
 the goal should be to upstream the patch ASAP. If there is no active
 upstream and the patch does more than the above that is a sign that
 someone needs to be willing to become the upstream maintainer first.

In this case we probably should become the upstream maintainer.  My patch 
actually bumps the version to 1.3 as it is sort of intended to do that.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is there any modern alternative to pstack?

2012-04-05 Thread Eitan Adler

On 5 April 2012 10:06, John Baldwin j...@freebsd.org wrote:
 In this case we probably should become the upstream maintainer.  My patch
 actually bumps the version to 1.3 as it is sort of intended to do that.

Yay!

Can you please roll a new tarball and host in ~/public_distfiles or
something of a similar nature? That way we could just point the port
at the distfile and we don't have to maintain a seperate patchfile in
the ports tree.



-- 
Eitan Adler
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: CAM disk I/O starvation

2012-04-05 Thread Gary Jennejohn

On Thu, 5 Apr 2012 05:22:46 +0200
Alexander Leidinger alexan...@leidinger.net wrote:

 On Tue, 3 Apr 2012 14:27:43 -0700 Jerry Toung jryto...@gmail.com
 wrote:
 
  On 4/3/12, Gary Jennejohn gljennj...@googlemail.com wrote:
  
   It would be interesting to see your patch.  I always run HEAD but
   maybe I could use it as a base for my own mods/tests.
  
  
  Here is the patch
 
 This looks fair if all your disks are working at the same time (e.g.
 RAID only setup), but if you have a setup where you have multiple
 disks and only one is doing something, you limit the amount of tags
 which can be used. No idea what kind of performance impact this would
 have.
 
 What about the case where you have more disks than tags?
 
 I also noticed that you do a strncmp for da. What about
 ada (available in 9 and 10), I would assume it suffers from the same
 problem.
 

It seems to.  All my disks are ada.

-- 
Gary Jennejohn
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: CAM disk I/O starvation

2012-04-05 Thread Jerry Toung

On Wed, Apr 4, 2012 at 8:22 PM, Alexander Leidinger alexan...@leidinger.net
 wrote:


 This looks fair if all your disks are working at the same time (e.g.
 RAID only setup), but if you have a setup where you have multiple
 disks and only one is doing something, you limit the amount of tags
 which can be used. No idea what kind of performance impact this would
 have.

 I haven't seen any performance impact. da1, the one the used to stall
consistenly get over 600MB/s.


 What about the case where you have more disks than tags?

This part of the patch takes care of that scenario:

@@ -998,6 +1003,24 @@ xpt_add_periph(struct cam_periph *periph
mtx_lock(xsoftc.xpt_topo_lock);
xsoftc.xpt_generation++;
+
+   if (device != NULL  device-sim-dev_count  1 
+(device-sim-max_dev_openings  device-sim-dev_count)) {
otherwise, we don't split the tags and the original behavior remains.



 I also noticed that you do a strncmp for da. What about
 ada (available in 9 and 10), I would assume it suffers from the same
 problem.


I am running FreeBSD 8.1, no ada. Me presenting a patch is just a way to
draw
attention on a problem and it improves things on my setup. There is
certainly a way to make it more general/inclusive.

Jerry
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: problems with mmap() and disk caching

2012-04-05 Thread Alan Cox


On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer.  I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
0; other:  0)
mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
0; other:  0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
0; other:  0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
0; other:  0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
0; other:  0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
0; other:  0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
0; other:  0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
0; other:  0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
0; other:  0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
0; other:  0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
0; other:  0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
0; other:  0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
0; other:  0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
0; other:  0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
0; other:  0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
0; other:  0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
0; other:  0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
0; other:  0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
0; other:  0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
0; other:  0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
0; other:  0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
0; other:  0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
0; other:  0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
0; other:  0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
0; other:  0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
0; other:  0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
0; other:  0)
mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
0; other:  0)
mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
0; other:  0)
mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
0; other:  0)

If I ran this:
$ cat /mnt/random-1024  /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
0; other:  0)
mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
0; other:  0)
mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
0; other:  0)
mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
0; other:  0)
mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
0; other:  0)

This is what I expect.  But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.


I'm pretty sure that the behavior here hasn't significantly changed in 
about twelve years.  Otherwise, I agree with your analysis.


On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt-dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);

because I suspect that the current code does more harm than good.  In 
theory, it saves activations of the page daemon.  However, more often 
than not, I suspect that we are spending more on page reactivations than 
we are saving on page daemon activations.  The sequential access 
detection heuristic is just too easily triggered.  For example, I've 
seen it triggered by demand paging of the gcc text segment.  Also, I 
think that pmap_remove_all() and especially vm_page_cache() are too 
severe for a detection heuristic

Re: problems with mmap() and disk caching

2012-04-05 Thread Alan Cox


On 04/04/2012 04:36, Andrey Zonov wrote:

On 04.04.2012 11:17, Konstantin Belousov wrote:


Calling madvise(MADV_RANDOM) fixes the issue, because the code to
deactivate/cache the pages is turned off. On the other hand, it also
turns of read-ahead for faulting, and the first loop becomes eternally
long.


Now it takes 5 times longer.  Anyway, thanks for explanation.



Doing MADV_WILLNEED does not fix the problem indeed, since willneed
reactivates the pages of the object at the time of call. To use
MADV_WILLNEED, you would need to call it between faults/memcpy.



I played with it, but no luck so far.



I've also never seen super pages, how to make them work?

They just work, at least for me. Look at the output of procstat -v
after enough loops finished to not cause disk activity.



The problem was in my test program.  I fixed it, now I see super pages 
but I'm still not satisfied.  There are several tests below:


1. With madvise(MADV_RANDOM) I see almost all super pages:
$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:  26.438535 (none:  0; res: 262144; super: 511; 
other:  0)
mmap:  2 pass took:   0.187311 (none:  0; res: 262144; super: 511; 
other:  0)
mmap:  3 pass took:   0.184953 (none:  0; res: 262144; super: 511; 
other:  0)
mmap:  4 pass took:   0.186007 (none:  0; res: 262144; super: 511; 
other:  0)
mmap:  5 pass took:   0.185790 (none:  0; res: 262144; super: 511; 
other:  0)


Should it be 512?



Check the starting virtual address.  It is probably not aligned on a 
superpage boundary.  Hence, a few pages at the start and end of your 
mapped region are not in a superpage.



2. Without madvise(MADV_RANDOM):
$ ./mmap /mnt/random-1024 50
mmap:  1 pass took:   7.629745 (none: 262112; res: 32; super: 0; 
other:  0)
mmap:  2 pass took:   7.301720 (none: 261202; res:942; super: 0; 
other:  0)
mmap:  3 pass took:   7.261416 (none: 260226; res:   1918; super: 1; 
other:  0)

[skip]
mmap: 49 pass took:   0.155368 (none:  0; res: 262144; super: 323; 
other:  0)
mmap: 50 pass took:   0.155438 (none:  0; res: 262144; super: 323; 
other:  0)


Only 323 pages.

3. If I just re-run test I don't see super pages with any size of 
block.


$ ./mmap /mnt/random-1024 5 $((130))
mmap:  1 pass took:   1.013939 (none:  0; res: 262144; super: 0; 
other:  0)
mmap:  2 pass took:   0.267082 (none:  0; res: 262144; super: 0; 
other:  0)
mmap:  3 pass took:   0.270711 (none:  0; res: 262144; super: 0; 
other:  0)
mmap:  4 pass took:   0.268940 (none:  0; res: 262144; super: 0; 
other:  0)
mmap:  5 pass took:   0.269634 (none:  0; res: 262144; super: 0; 
other:  0)


4. If I activate madvise(MADV_WILLNEDD) in the copy loop and re-run 
test then I see super pages only if I use block greater than 2Mb.


$ ./mmap /mnt/random-1024 1 $((121))
mmap:  1 pass took:   0.299722 (none:  0; res: 262144; super: 0; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((122))
mmap:  1 pass took:   0.271828 (none:  0; res: 262144; super: 170; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((123))
mmap:  1 pass took:   0.333188 (none:  0; res: 262144; super: 258; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((124))
mmap:  1 pass took:   0.339250 (none:  0; res: 262144; super: 303; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((125))
mmap:  1 pass took:   0.418812 (none:  0; res: 262144; super: 324; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((126))
mmap:  1 pass took:   0.360892 (none:  0; res: 262144; super: 335; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((127))
mmap:  1 pass took:   0.401122 (none:  0; res: 262144; super: 342; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((128))
mmap:  1 pass took:   0.478764 (none:  0; res: 262144; super: 345; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((129))
mmap:  1 pass took:   0.607266 (none:  0; res: 262144; super: 346; 
other:  0)

$ ./mmap /mnt/random-1024 1 $((130))
mmap:  1 pass took:   0.901269 (none:  0; res: 262144; super: 347; 
other:  0)


5. If I activate madvise(MADV_WILLNEED) immediately after mmap() then 
I see some number of super pages (the number from test #2).


$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.178666 (none:  0; res: 262144; super: 323; 
other:  0)
mmap:  2 pass took:   0.158889 (none:  0; res: 262144; super: 323; 
other:  0)
mmap:  3 pass took:   0.157229 (none:  0; res: 262144; super: 323; 
other:  0)
mmap:  4 pass took:   0.156895 (none:  0; res: 262144; super: 323; 
other:  0)
mmap:  5 pass took:   0.162938 (none:  0; res: 262144; super: 323; 
other:  0)


6. If I read file manually before test then I don't see super pages 
with any size of block and madvise(MADV_WILLNEED) doesn't help.


$ ./mmap /mnt/random-1024 5 $((130))
mmap:  1 pass took:   0.996767 (none:  0; res: 262144; super: 0; 
other:  0)
mmap:  2 pass took:   0.311129 (none:

Re: Startvation of realtime piority threads

2012-04-05 Thread John Baldwin

On Thursday, April 05, 2012 1:07:55 am David Xu wrote:
 On 2012/4/5 11:56, Konstantin Belousov wrote:
  On Wed, Apr 04, 2012 at 06:54:06PM -0700, Sushanth Rai wrote:
  I have a multithreaded user space program that basically runs at realtime 
priority. Synchronization between threads are done using spinlock. When 
running this program on a SMP system under heavy memory pressure I see that 
thread holding the spinlock is starved out of cpu. The cpus are effectively 
consumed by other threads that are spinning for lock to become available.
 
  After instrumenting the kernel a little bit what I found was that under 
memory pressure, when the user thread holding the spinlock traps into the 
kernel due to page fault, that thread sleeps until the free pages are 
available. The thread sleeps PUSER priority (within vm_waitpfault()). When it 
is ready to run, it is queued at PUSER priority even thought it's base 
priority is realtime. The other siblings threads that are spinning at realtime 
priority to acquire the spinlock starves the owner of spinlock.
 
  I was wondering if the sleep in vm_waitpfault() should be a 
MAX(td_user_pri, PUSER) instead of just PUSER. I'm running on 7.2 and it looks 
like this logic is the same in the trunk.
  It just so happen that your program stumbles upon a single sleep point in
  the kernel. If for whatever reason the thread in kernel is put off CPU
  due to failure to acquire any resource without priority propagation,
  you would get the same effect. Only blockable primitives do priority
  propagation, that are mutexes and rwlocks, AFAIR. In other words, any
  sx/lockmgr/sleep points are vulnerable to the same issue.
 This is why I suggested that POSIX realtime priority should not be 
 boosted, it should be
 only higher than PRI_MIN_TIMESHARE but lower than any priority all 
 msleep() callers
 provided.  The problem is userland realtime thread 's busy looping code 
 can cause
 starvation a thread in kernel which holding a critical resource.
 In kernel we can avoid to write dead-loop code, but userland code is not 
 trustable.

Note that you have to be root to be rtprio, and that there is trustable
userland code (just because you haven't used any doesn't mean it doesn't
exist).

 If you search Realtime thread priorities in 2010-december within @arch 
 list.
 you may find the argument.

I think the bug here is that sched_sleep() should not lower the priority of
an rtprio process.  It should arguably not raise the priority of an idprio
process either, but sched_sleep() should probably only apply to timesharing
threads.

All that said, userland rtprio code is going to have to be careful.  It should
be using things like wired memory as Kostik suggested, and probably avoiding
most system calls.  You can definitely blow your foot off quite easily in lots 
of ways with rtprio.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: problems with mmap() and disk caching

2012-04-05 Thread Konstantin Belousov

On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote:
 On 04/04/2012 02:17, Konstantin Belousov wrote:
 On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
 Hi,
 
 I open the file, then call mmap() on the whole file and get pointer,
 then I work with this pointer.  I expect that page should be only once
 touched to get it into the memory (disk cache?), but this doesn't work!
 
 I wrote the test (attached) and ran it for the 1G file generated from
 /dev/random, the result is the following:
 
 Prepare file:
 # swapoff -a
 # newfs /dev/ada0b
 # mount /dev/ada0b /mnt
 # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024
 
 Purge cache:
 # umount /mnt
 # mount /dev/ada0b /mnt
 
 Run test:
 $ ./mmap /mnt/random-1024 30
 mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
 0; other:  0)
 mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
 0; other:  0)
 mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
 0; other:  0)
 mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
 0; other:  0)
 mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
 0; other:  0)
 mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
 0; other:  0)
 mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
 0; other:  0)
 mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
 0; other:  0)
 mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
 0; other:  0)
 mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
 0; other:  0)
 mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
 0; other:  0)
 mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
 0; other:  0)
 mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
 0; other:  0)
 mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
 0; other:  0)
 mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
 0; other:  0)
 mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
 0; other:  0)
 mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
 0; other:  0)
 mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
 0; other:  0)
 mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
 0; other:  0)
 mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
 0; other:  0)
 mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
 0; other:  0)
 mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
 0; other:  0)
 mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
 0; other:  0)
 mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
 0; other:  0)
 mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
 0; other:  0)
 mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
 0; other:  0)
 mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
 0; other:  0)
 mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
 0; other:  0)
 mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
 0; other:  0)
 mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
 0; other:  0)
 
 If I ran this:
 $ cat /mnt/random-1024  /dev/null
 before test, when result is the following:
 
 $ ./mmap /mnt/random-1024 5
 mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
 0; other:  0)
 mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
 0; other:  0)
 mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
 0; other:  0)
 mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
 0; other:  0)
 mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
 0; other:  0)
 
 This is what I expect.  But why this doesn't work without reading file
 manually?
 Issue seems to be in some change of the behaviour of the reserv or
 phys allocator. I Cc:ed Alan.
 
 I'm pretty sure that the behavior here hasn't significantly changed in 
 about twelve years.  Otherwise, I agree with your analysis.
 
 On more than one occasion, I've been tempted to change:
 
 pmap_remove_all(mt);
 if (mt-dirty != 0)
 vm_page_deactivate(mt);
 else
 vm_page_cache(mt);
 
 to:
 
 vm_page_dontneed(mt);
 
 because I suspect that the current code does more harm than good.  In 
 theory, it saves activations of the page daemon.  However, more often 
 than not, I suspect that we are spending more on page reactivations than 
 we are saving on page daemon activations.  The sequential access 
 detection heuristic is just too easily triggered.  For example, I've

Re: [RFT][patch] Scheduling for HTT and not only

2012-04-05 Thread Arnaud Lacombe

Hi,

[Sorry for the delay, I got a bit sidetrack'ed...]

2012/2/17 Alexander Motin m...@freebsd.org:
On 17.02.2012 18:53, Arnaud Lacombe wrote:

On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinm...@freebsd.org wrote:

On 02/15/12 21:54, Jeff Roberson wrote:

On Wed, 15 Feb 2012, Alexander Motin wrote:

I've decided to stop those cache black magic practices and focus on
things that really exist in this world -- SMT and CPU load. I've
dropped most of cache related things from the patch and made the rest
of things more strict and predictable:
http://people.freebsd.org/~mav/sched.htt34.patch

This looks great. I think there is value in considering the other
approach further but I would like to do this part first. It would be
nice to also add priority as a greater influence in the load balancing
as well.

I haven't got good idea yet about balancing priorities, but I've
rewritten
balancer itself. As soon as sched_lowest() / sched_highest() are more
intelligent now, they allowed to remove topology traversing from the
balancer itself. That should fix double-swapping problem, allow to keep
some
affinity while moving threads and make balancing more fair. I did number
of
tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and
16
threads everything is stationary as it should. With 9 threads I see
regular
and random load move between all 8 CPUs. Measurements on 5 minutes run
show
deviation of only about 5 seconds. It is the same deviation as I see
caused
by only scheduling of 16 threads on 8 cores without any balancing needed
at
all. So I believe this code works as it should.

Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

I plan this to be a final patch of this series (more to come :)) and if
there will be no problems or objections, I am going to commit it (except
some debugging KTRs) in about ten days. So now it's a good time for
reviews
and testing. :)

is there a place where all the patches are available ?

All my scheduler patches are cumulative, so all you need is only the last
mentioned here sched.htt40.patch.

You may want to have a look to the result I collected in the
`runs/freebsd-experiments' branch of:

https://github.com/lacombar/hackbench/

and compare them with vanilla FreeBSD 9.0 and -CURRENT results
available in `runs/freebsd'. On the dual package platform, your patch
is not a definite win.

But in some cases, especially for multi-socket systems, to let it show its
best, you may want to apply additional patch from avg@ to better detect CPU
topology:
https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd

test I conducted specifically for this patch did not showed much improvement...

- Arnaud
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: problems with mmap() and disk caching

2012-04-05 Thread Alan Cox


On 04/05/2012 12:31, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer.  I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
0; other:  0)
mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
0; other:  0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
0; other:  0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
0; other:  0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
0; other:  0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
0; other:  0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
0; other:  0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
0; other:  0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
0; other:  0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
0; other:  0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
0; other:  0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
0; other:  0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
0; other:  0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
0; other:  0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
0; other:  0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
0; other:  0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
0; other:  0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
0; other:  0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
0; other:  0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
0; other:  0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
0; other:  0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
0; other:  0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
0; other:  0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
0; other:  0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
0; other:  0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
0; other:  0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
0; other:  0)
mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
0; other:  0)
mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
0; other:  0)
mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
0; other:  0)

If I ran this:
$ cat /mnt/random-1024   /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
0; other:  0)
mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
0; other:  0)
mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
0; other:  0)
mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
0; other:  0)
mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
0; other:  0)

This is what I expect.  But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years.  Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

 pmap_remove_all(mt);
 if (mt-dirty != 0)
 vm_page_deactivate(mt);
 else
 vm_page_cache(mt);

to:

 vm_page_dontneed(mt);

because I suspect that the current code does more harm than good.  In
theory, it saves activations of the page daemon.  However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations.  The sequential access
detection heuristic is just too easily triggered.  For example, I've
seen it triggered by demand paging of the gcc text segment.  Also, I

Re: [RFT][patch] Scheduling for HTT and not only

2012-04-05 Thread Alexander Motin


On 05.04.2012 21:12, Arnaud Lacombe wrote:

Hi,

[Sorry for the delay, I got a bit sidetrack'ed...]

2012/2/17 Alexander Motinm...@freebsd.org:

On 17.02.2012 18:53, Arnaud Lacombe wrote:


On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinm...@freebsd.orgwrote:


On 02/15/12 21:54, Jeff Roberson wrote:


On Wed, 15 Feb 2012, Alexander Motin wrote:


I've decided to stop those cache black magic practices and focus on
things that really exist in this world -- SMT and CPU load. I've
dropped most of cache related things from the patch and made the rest
of things more strict and predictable:
http://people.freebsd.org/~mav/sched.htt34.patch



This looks great. I think there is value in considering the other
approach further but I would like to do this part first. It would be
nice to also add priority as a greater influence in the load balancing
as well.



I haven't got good idea yet about balancing priorities, but I've
rewritten
balancer itself. As soon as sched_lowest() / sched_highest() are more
intelligent now, they allowed to remove topology traversing from the
balancer itself. That should fix double-swapping problem, allow to keep
some
affinity while moving threads and make balancing more fair. I did number
of
tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and
16
threads everything is stationary as it should. With 9 threads I see
regular
and random load move between all 8 CPUs. Measurements on 5 minutes run
show
deviation of only about 5 seconds. It is the same deviation as I see
caused
by only scheduling of 16 threads on 8 cores without any balancing needed
at
all. So I believe this code works as it should.

Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

I plan this to be a final patch of this series (more to come :)) and if
there will be no problems or objections, I am going to commit it (except
some debugging KTRs) in about ten days. So now it's a good time for
reviews
and testing. :)


is there a place where all the patches are available ?



All my scheduler patches are cumulative, so all you need is only the last
mentioned here sched.htt40.patch.


You may want to have a look to the result I collected in the
`runs/freebsd-experiments' branch of:

https://github.com/lacombar/hackbench/

and compare them with vanilla FreeBSD 9.0 and -CURRENT results
available in `runs/freebsd'. On the dual package platform, your patch
is not a definite win.


But in some cases, especially for multi-socket systems, to let it show its
best, you may want to apply additional patch from avg@ to better detect CPU
topology:
https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd


test I conducted specifically for this patch did not showed much improvement...


If I understand right, this test runs thousands of threads sending and 
receiving data over the pipes. It is quite likely that all CPUs will be 
always busy and so load balancing is not really important in this test, 
What looks good is that more complicated new code is not slower then old 
one.


While this test seems very scheduler-intensive, it may depend on many 
other factors, such as syscall performance, context switch, etc. I'll 
try to play more with it.


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: problems with mmap() and disk caching

2012-04-05 Thread Andrey Zonov


On 05.04.2012 19:54, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

[snip]

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.


I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt-dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);



Thanks Alan!  Now it works as I expect!

But I have more questions to you and kib@.  They are in my test below.

So, prepare file as earlier, and take information about memory usage 
from top(1).  After preparation, but before test:

Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free

First run:
$ ./mmap /mnt/random
mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super: 
0; other:  0)


No super pages after first run, why?..

Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free

Now the file is in inactive memory, that's good.

Second run:
$ ./mmap /mnt/random
mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super: 
511; other:  0)


All super pages are here, nice.

Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free

Wow, all inactive pages moved to active and sit there even after process 
was terminated, that's not good, what do you think?


Read the file:
$ cat /mnt/random  /dev/null

Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free

Now the file is in wired memory.  I do not understand why so.

Could you please give me explanation about active/inactive/wired memory?



because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've seen
it triggered by demand paging of the gcc text segment. Also, I think
that pmap_remove_all() and especially vm_page_cache() are too severe for
a detection heuristic that is so easily triggered.


[snip]

--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: problems with mmap() and disk caching

2012-04-05 Thread Konstantin Belousov

On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote:
 On 05.04.2012 19:54, Alan Cox wrote:
 On 04/04/2012 02:17, Konstantin Belousov wrote:
 On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
 [snip]
 This is what I expect. But why this doesn't work without reading file
 manually?
 Issue seems to be in some change of the behaviour of the reserv or
 phys allocator. I Cc:ed Alan.
 
 I'm pretty sure that the behavior here hasn't significantly changed in
 about twelve years. Otherwise, I agree with your analysis.
 
 On more than one occasion, I've been tempted to change:
 
 pmap_remove_all(mt);
 if (mt-dirty != 0)
 vm_page_deactivate(mt);
 else
 vm_page_cache(mt);
 
 to:
 
 vm_page_dontneed(mt);
 
 
 Thanks Alan!  Now it works as I expect!
 
 But I have more questions to you and kib@.  They are in my test below.
 
 So, prepare file as earlier, and take information about memory usage 
 from top(1).  After preparation, but before test:
 Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free
 
 First run:
 $ ./mmap /mnt/random
 mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super: 
 0; other:  0)
 
 No super pages after first run, why?..
 
 Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free
 
 Now the file is in inactive memory, that's good.
 
 Second run:
 $ ./mmap /mnt/random
 mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super: 
 511; other:  0)
 
 All super pages are here, nice.
 
 Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free
 
 Wow, all inactive pages moved to active and sit there even after process 
 was terminated, that's not good, what do you think?
Why do you think this is 'not good' ? You have plenty of free memory,
there is no memory pressure, and all pages were referenced recently.
THere is no reason for them to be deactivated.

 
 Read the file:
 $ cat /mnt/random  /dev/null
 
 Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free
 
 Now the file is in wired memory.  I do not understand why so.
You do use UFS, right ? There is enough buffer headers and buffer KVA
to have buffers allocated for the whole file content. Since buffers wire
corresponding pages, you get pages migrated to wired.

When there appears a buffer pressure (i.e., any other i/o started),
the buffers will be repurposed and pages moved to inactive.

 
 Could you please give me explanation about active/inactive/wired memory?
 
 
 because I suspect that the current code does more harm than good. In
 theory, it saves activations of the page daemon. However, more often
 than not, I suspect that we are spending more on page reactivations than
 we are saving on page daemon activations. The sequential access
 detection heuristic is just too easily triggered. For example, I've seen
 it triggered by demand paging of the gcc text segment. Also, I think
 that pmap_remove_all() and especially vm_page_cache() are too severe for
 a detection heuristic that is so easily triggered.
 
 [snip]
 
 -- 
 Andrey Zonov


pgpgwsB0EvlKi.pgp
Description: PGP signature

Re: problems with mmap() and disk caching

2012-04-05 Thread Andrey Zonov


On 05.04.2012 23:41, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote:

On 05.04.2012 19:54, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

[snip]

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.


I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt-dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);



Thanks Alan!  Now it works as I expect!

But I have more questions to you and kib@.  They are in my test below.

So, prepare file as earlier, and take information about memory usage
from top(1).  After preparation, but before test:
Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free

First run:
$ ./mmap /mnt/random
mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super:
0; other:  0)

No super pages after first run, why?..

Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free

Now the file is in inactive memory, that's good.

Second run:
$ ./mmap /mnt/random
mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super:
511; other:  0)

All super pages are here, nice.

Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free

Wow, all inactive pages moved to active and sit there even after process
was terminated, that's not good, what do you think?

Why do you think this is 'not good' ? You have plenty of free memory,
there is no memory pressure, and all pages were referenced recently.
THere is no reason for them to be deactivated.



I always thought that active memory this is a sum of resident memory of 
all processes, inactive shows disk cache and wired shows kernel itself.




Read the file:
$ cat /mnt/random  /dev/null

Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free

Now the file is in wired memory.  I do not understand why so.

You do use UFS, right ?


Yes.


There is enough buffer headers and buffer KVA
to have buffers allocated for the whole file content. Since buffers wire
corresponding pages, you get pages migrated to wired.

When there appears a buffer pressure (i.e., any other i/o started),
the buffers will be repurposed and pages moved to inactive.



OK, how can I get amount of disk cache?



Could you please give me explanation about active/inactive/wired memory?



because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've seen
it triggered by demand paging of the gcc text segment. Also, I think
that pmap_remove_all() and especially vm_page_cache() are too severe for
a detection heuristic that is so easily triggered.


[snip]

--
Andrey Zonov


--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Making addresses from the address specified and given size in the child's address space and make it valid.

2012-04-05 Thread kota saikrishna

Hello,

I am trying to inject code into the child process using ptrace utility.
The function of the injecting code is to make addresses from the address
specified and the size in the child's address space and make it valid(i.e
to read and write ..).
On my knowledge i tried to use mmap system call but i could not able to
allocate the memory can any one help in this point how to achieve this.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Startvation of realtime piority threads

2012-04-05 Thread Sushanth Rai

I understand the downside of badly written realtime app.  In my case 
application runs in userspace without making much syscalls and by all means it 
is a well behaved application. Yes, I can wire memory, change the application 
to use mutex instead of spinlock and those changes should help but they are 
still working around the problem. I still believe kernel should not lower the 
realtime priority when blocking on resources. This can lead to priority 
inversion, especially since these threads run at fixed priorities and kernel 
doesn't muck with them.
 
As you suggested _sleep() should not adjust the priorities for realtime 
threads. 

Thanks,
Sushanth

--- On Thu, 4/5/12, John Baldwin j...@freebsd.org wrote:

 From: John Baldwin j...@freebsd.org
 Subject: Re: Startvation of realtime piority threads
 To: freebsd-hackers@freebsd.org, davi...@freebsd.org
 Date: Thursday, April 5, 2012, 9:01 AM
 On Thursday, April 05, 2012 1:07:55
 am David Xu wrote:
  On 2012/4/5 11:56, Konstantin Belousov wrote:
   On Wed, Apr 04, 2012 at 06:54:06PM -0700, Sushanth
 Rai wrote:
   I have a multithreaded user space program that
 basically runs at realtime 
 priority. Synchronization between threads are done using
 spinlock. When 
 running this program on a SMP system under heavy memory
 pressure I see that 
 thread holding the spinlock is starved out of cpu. The cpus
 are effectively 
 consumed by other threads that are spinning for lock to
 become available.
  
   After instrumenting the kernel a little bit
 what I found was that under 
 memory pressure, when the user thread holding the spinlock
 traps into the 
 kernel due to page fault, that thread sleeps until the free
 pages are 
 available. The thread sleeps PUSER priority (within
 vm_waitpfault()). When it 
 is ready to run, it is queued at PUSER priority even thought
 it's base 
 priority is realtime. The other siblings threads that are
 spinning at realtime 
 priority to acquire the spinlock starves the owner of
 spinlock.
  
   I was wondering if the sleep in
 vm_waitpfault() should be a 
 MAX(td_user_pri, PUSER) instead of just PUSER. I'm running
 on 7.2 and it looks 
 like this logic is the same in the trunk.
   It just so happen that your program stumbles upon
 a single sleep point in
   the kernel. If for whatever reason the thread in
 kernel is put off CPU
   due to failure to acquire any resource without
 priority propagation,
   you would get the same effect. Only blockable
 primitives do priority
   propagation, that are mutexes and rwlocks, AFAIR.
 In other words, any
   sx/lockmgr/sleep points are vulnerable to the same
 issue.
  This is why I suggested that POSIX realtime priority
 should not be 
  boosted, it should be
  only higher than PRI_MIN_TIMESHARE but lower than any
 priority all 
  msleep() callers
  provided.  The problem is userland realtime thread
 's busy looping code 
  can cause
  starvation a thread in kernel which holding a critical
 resource.
  In kernel we can avoid to write dead-loop code, but
 userland code is not 
  trustable.
 
 Note that you have to be root to be rtprio, and that there
 is trustable
 userland code (just because you haven't used any doesn't
 mean it doesn't
 exist).
 
  If you search Realtime thread priorities in
 2010-december within @arch 
  list.
  you may find the argument.
 
 I think the bug here is that sched_sleep() should not lower
 the priority of
 an rtprio process.  It should arguably not raise the
 priority of an idprio
 process either, but sched_sleep() should probably only apply
 to timesharing
 threads.
 
 All that said, userland rtprio code is going to have to be
 careful.  It should
 be using things like wired memory as Kostik suggested, and
 probably avoiding
 most system calls.  You can definitely blow your foot
 off quite easily in lots 
 of ways with rtprio.
 
 -- 
 John Baldwin
 ___
 freebsd-hackers@freebsd.org
 mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [gsoc2012] Port NetBSD's UDF implementation

Re: [gsoc2012] Port NetBSD's UDF implementation

opensslv.h SHLIB_VERSION_NUMBER

Re: Is there any modern alternative to pstack?

bin/166660: new stdbuf utility

Re: Is there any modern alternative to pstack?

Re: Is there any modern alternative to pstack?

Re: CAM disk I/O starvation

Re: CAM disk I/O starvation

Re: problems with mmap() and disk caching

Re: problems with mmap() and disk caching

Re: Startvation of realtime piority threads

Re: problems with mmap() and disk caching

Re: [RFT][patch] Scheduling for HTT and not only

Re: problems with mmap() and disk caching

Re: [RFT][patch] Scheduling for HTT and not only

Re: problems with mmap() and disk caching

Re: problems with mmap() and disk caching

Re: problems with mmap() and disk caching

Making addresses from the address specified and given size in the child's address space and make it valid.

Re: Startvation of realtime piority threads

21 matches

Site Navigation

Mail list logo

Footer information