On Sun, Sep 01, 2013 at 01:13:06AM +0100, Al Viro wrote:
> +static noinline_for_stack
> +char *dentry_name(char *buf, char *end, const struct dentry *d, struct
> printf_spec spec,
> + int depth)
> +{
> + int i, n = 0;
> + const char *s;
> + char *p = buf;
> + const
On Sun, Sep 01, 2013 at 01:13:06AM +0100, Al Viro wrote:
+static noinline_for_stack
+char *dentry_name(char *buf, char *end, const struct dentry *d, struct
printf_spec spec,
+ int depth)
+{
+ int i, n = 0;
+ const char *s;
+ char *p = buf;
+ const struct
On Sun, Sep 08, 2013 at 08:32:03PM -0700, Linus Torvalds wrote:
> On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote:
> >
> > There's one exception - basically, we decide to put duplicates of
> > reference(s) we hold into (a bunch of) structures being created. If
> > we decide that we'd failed and
Linus Torvalds wrote:
> On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote:
>>
>> There's one exception - basically, we decide to put duplicates of
>> reference(s) we hold into (a bunch of) structures being created. If
>> we decide that we'd failed and need to roll back, the structures
>> need to be
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote:
>
> There's one exception - basically, we decide to put duplicates of
> reference(s) we hold into (a bunch of) structures being created. If
> we decide that we'd failed and need to roll back, the structures
> need to be taken out of whatever lists,
Linus Torvalds wrote:
> On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote:
>>
>> Well... unlazy_walk() is always followed by terminate_walk() very shortly,
>> but there's a minor problem - terminate_walk() uses "are we in RCU
>> mode?" for two things:
>> a) do we need to do path_put() here?
On Sun, Sep 08, 2013 at 05:38:46PM -0700, Linus Torvalds wrote:
> On Sun, Sep 8, 2013 at 5:35 PM, Al Viro wrote:
> >
> > That should also work, replacing the current tip of #for-next. Do you
> > prefer to merge those two diffs of yours into a single commit?
>
> If you're ok with my patch (it's
On Sun, Sep 8, 2013 at 5:35 PM, Al Viro wrote:
>
> That should also work, replacing the current tip of #for-next. Do you
> prefer to merge those two diffs of yours into a single commit?
If you're ok with my patch (it's now also tested, I'm running with it
and it looks fine), I'll commit that
On Sun, Sep 08, 2013 at 05:25:40PM -0700, Linus Torvalds wrote:
> On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote:
> >
> > Well... unlazy_walk() is always followed by terminate_walk() very shortly,
> > but there's a minor problem - terminate_walk() uses "are we in RCU
> > mode?" for two things:
>
On Mon, Sep 09, 2013 at 01:03:00AM +0100, Al Viro wrote:
> Well... unlazy_walk() is always followed by terminate_walk() very shortly,
> but there's a minor problem - terminate_walk() uses "are we in RCU
> mode?" for two things:
> a) do we need to do path_put() here?
> b) do we need
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote:
>
> Well... unlazy_walk() is always followed by terminate_walk() very shortly,
> but there's a minor problem - terminate_walk() uses "are we in RCU
> mode?" for two things:
> a) do we need to do path_put() here?
> b) do we need to
On Sun, Sep 08, 2013 at 02:45:32PM -0700, Linus Torvalds wrote:
> But that test never triggered any of my thinking, and it's racy
> anyway, and the condition we care about is another race, which means
> that it not only didn't trigger my thinking, it doesn't trigger in any
> normal testing
Al, this is mainly a rambling question for you, but others left on the
Cc just FYI..
On Thu, Aug 29, 2013 at 4:42 PM, Linus Torvalds
wrote:
>
> So I fixed the VFS layer instead. With dput() and friends using
> lockrefs, the only thing remaining in the hot RCU dentry lookup path
> was the nasty
Al, this is mainly a rambling question for you, but others left on the
Cc just FYI..
On Thu, Aug 29, 2013 at 4:42 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
So I fixed the VFS layer instead. With dput() and friends using
lockrefs, the only thing remaining in the hot RCU dentry
On Sun, Sep 08, 2013 at 02:45:32PM -0700, Linus Torvalds wrote:
But that test never triggered any of my thinking, and it's racy
anyway, and the condition we care about is another race, which means
that it not only didn't trigger my thinking, it doesn't trigger in any
normal testing either.
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro v...@zeniv.linux.org.uk wrote:
Well... unlazy_walk() is always followed by terminate_walk() very shortly,
but there's a minor problem - terminate_walk() uses are we in RCU
mode? for two things:
a) do we need to do path_put() here?
b)
On Mon, Sep 09, 2013 at 01:03:00AM +0100, Al Viro wrote:
Well... unlazy_walk() is always followed by terminate_walk() very shortly,
but there's a minor problem - terminate_walk() uses are we in RCU
mode? for two things:
a) do we need to do path_put() here?
b) do we need to
On Sun, Sep 08, 2013 at 05:25:40PM -0700, Linus Torvalds wrote:
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro v...@zeniv.linux.org.uk wrote:
Well... unlazy_walk() is always followed by terminate_walk() very shortly,
but there's a minor problem - terminate_walk() uses are we in RCU
mode? for
On Sun, Sep 8, 2013 at 5:35 PM, Al Viro v...@zeniv.linux.org.uk wrote:
That should also work, replacing the current tip of #for-next. Do you
prefer to merge those two diffs of yours into a single commit?
If you're ok with my patch (it's now also tested, I'm running with it
and it looks fine),
On Sun, Sep 08, 2013 at 05:38:46PM -0700, Linus Torvalds wrote:
On Sun, Sep 8, 2013 at 5:35 PM, Al Viro v...@zeniv.linux.org.uk wrote:
That should also work, replacing the current tip of #for-next. Do you
prefer to merge those two diffs of yours into a single commit?
If you're ok with
Linus Torvalds wrote:
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro v...@zeniv.linux.org.uk wrote:
Well... unlazy_walk() is always followed by terminate_walk() very shortly,
but there's a minor problem - terminate_walk() uses are we in RCU
mode? for two things:
a) do we need to do
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro v...@zeniv.linux.org.uk wrote:
There's one exception - basically, we decide to put duplicates of
reference(s) we hold into (a bunch of) structures being created. If
we decide that we'd failed and need to roll back, the structures
need to be taken out
Linus Torvalds wrote:
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro v...@zeniv.linux.org.uk wrote:
There's one exception - basically, we decide to put duplicates of
reference(s) we hold into (a bunch of) structures being created. If
we decide that we'd failed and need to roll back, the structures
On Sun, Sep 08, 2013 at 08:32:03PM -0700, Linus Torvalds wrote:
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro v...@zeniv.linux.org.uk wrote:
There's one exception - basically, we decide to put duplicates of
reference(s) we hold into (a bunch of) structures being created. If
we decide that we'd
* Waiman Long wrote:
> On 09/05/2013 09:31 AM, Ingo Molnar wrote:
> >* Waiman Long wrote:
> >
> >
> >>The latest tty patches did work. The tty related spinlock contention
> >>is now completely gone. The short workload can now reach over 8M JPM
> >>which is the highest I have ever seen.
> >>
>
On 09/05/2013 09:31 AM, Ingo Molnar wrote:
* Waiman Long wrote:
The latest tty patches did work. The tty related spinlock contention
is now completely gone. The short workload can now reach over 8M JPM
which is the highest I have ever seen.
The perf profile was:
5.85% reaim reaim
* Waiman Long wrote:
> On 09/03/2013 03:09 PM, Linus Torvalds wrote:
> >On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
> > wrote:
> >>I suspect the tty_ldisc_lock() could be made to go away if we care.
> >Heh. I just pulled the tty patches from Greg, and the locking has
> >changed completely.
* Waiman Long waiman.l...@hp.com wrote:
On 09/03/2013 03:09 PM, Linus Torvalds wrote:
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
I suspect the tty_ldisc_lock() could be made to go away if we care.
Heh. I just pulled the tty patches from Greg, and
On 09/05/2013 09:31 AM, Ingo Molnar wrote:
* Waiman Longwaiman.l...@hp.com wrote:
The latest tty patches did work. The tty related spinlock contention
is now completely gone. The short workload can now reach over 8M JPM
which is the highest I have ever seen.
The perf profile was:
5.85%
* Waiman Long waiman.l...@hp.com wrote:
On 09/05/2013 09:31 AM, Ingo Molnar wrote:
* Waiman Longwaiman.l...@hp.com wrote:
The latest tty patches did work. The tty related spinlock contention
is now completely gone. The short workload can now reach over 8M JPM
which is the highest I
On 09/04/2013 05:34 PM, Linus Torvalds wrote:
On Wed, Sep 4, 2013 at 12:25 PM, Waiman Long wrote:
Yes, the perf profile was taking from an 80-core machine. There isn't any
scalability issue hiding for the short workload on an 80-core machine.
However, I am certain that more may pop up when
On Wed, Sep 4, 2013 at 12:25 PM, Waiman Long wrote:
>
> Yes, the perf profile was taking from an 80-core machine. There isn't any
> scalability issue hiding for the short workload on an 80-core machine.
>
> However, I am certain that more may pop up when running in an even larger
> machine like
On 09/04/2013 11:14 AM, Linus Torvalds wrote:
On Wed, Sep 4, 2013 at 7:52 AM, Waiman Long wrote:
The latest tty patches did work. The tty related spinlock contention is now
completely gone. The short workload can now reach over 8M JPM which is the
highest I have ever seen.
Good. And this was
On 09/03/2013 03:09 PM, Linus Torvalds wrote:
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
wrote:
I suspect the tty_ldisc_lock() could be made to go away if we care.
Heh. I just pulled the tty patches from Greg, and the locking has
changed completely.
It may actually fix your AIM7
On Wed, Sep 4, 2013 at 7:52 AM, Waiman Long wrote:
>
> The latest tty patches did work. The tty related spinlock contention is now
> completely gone. The short workload can now reach over 8M JPM which is the
> highest I have ever seen.
Good. And this was with the 80-core machine, so there aren't
On Wed, Sep 4, 2013 at 7:52 AM, Waiman Long waiman.l...@hp.com wrote:
The latest tty patches did work. The tty related spinlock contention is now
completely gone. The short workload can now reach over 8M JPM which is the
highest I have ever seen.
Good. And this was with the 80-core machine,
On 09/03/2013 03:09 PM, Linus Torvalds wrote:
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
I suspect the tty_ldisc_lock() could be made to go away if we care.
Heh. I just pulled the tty patches from Greg, and the locking has
changed completely.
It may
On 09/04/2013 11:14 AM, Linus Torvalds wrote:
On Wed, Sep 4, 2013 at 7:52 AM, Waiman Longwaiman.l...@hp.com wrote:
The latest tty patches did work. The tty related spinlock contention is now
completely gone. The short workload can now reach over 8M JPM which is the
highest I have ever seen.
On Wed, Sep 4, 2013 at 12:25 PM, Waiman Long waiman.l...@hp.com wrote:
Yes, the perf profile was taking from an 80-core machine. There isn't any
scalability issue hiding for the short workload on an 80-core machine.
However, I am certain that more may pop up when running in an even larger
On 09/04/2013 05:34 PM, Linus Torvalds wrote:
On Wed, Sep 4, 2013 at 12:25 PM, Waiman Longwaiman.l...@hp.com wrote:
Yes, the perf profile was taking from an 80-core machine. There isn't any
scalability issue hiding for the short workload on an 80-core machine.
However, I am certain that more
On Wed, Sep 4, 2013 at 1:15 AM, Dave Jones wrote:
> On Wed, Sep 04, 2013 at 01:05:38AM +0200, Sedat Dilek wrote:
> > On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones wrote:
> > > On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
> > >
> > > > > You're spending more time on the task
On Wed, Sep 04, 2013 at 01:05:38AM +0200, Sedat Dilek wrote:
> On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones wrote:
> > On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
> >
> > > > You're spending more time on the task stats than on the actual lookup.
> > > > Maybe you should
On Wed, Sep 4, 2013 at 12:41 AM, Sedat Dilek wrote:
> On Tue, Sep 3, 2013 at 5:14 PM, Waiman Long wrote:
>> On 09/03/2013 02:01 AM, Ingo Molnar wrote:
>>>
>>> * Waiman Long wrote:
>>>
Yes, that patch worked. It eliminated the lglock as a bottleneck in the
AIM7 workload. The
On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones wrote:
> On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
>
> > > You're spending more time on the task stats than on the actual lookup.
> > > Maybe you should turn off CONFIG_TASKSTATS..But why that whole
> > > irq_return thing? Odd.
>
On Tue, Sep 3, 2013 at 5:14 PM, Waiman Long wrote:
> On 09/03/2013 02:01 AM, Ingo Molnar wrote:
>>
>> * Waiman Long wrote:
>>
>>> Yes, that patch worked. It eliminated the lglock as a bottleneck in the
>>> AIM7 workload. The lg_global_lock did not show up in the perf profile,
>>> whereas the
On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
> > You're spending more time on the task stats than on the actual lookup.
> > Maybe you should turn off CONFIG_TASKSTATS..But why that whole
> > irq_return thing? Odd.
> >
>
> [ init/Kconfig ]
> ...
> config TASKSTATS
>
On Sun, Sep 1, 2013 at 5:32 PM, Linus Torvalds
wrote:
> On Sun, Sep 1, 2013 at 3:01 AM, Sedat Dilek wrote:
>>
>> Looks like this is now 10x faster: ~2.66Mloops (debug) VS.
>> ~26.60Mloops (no-debug).
>
> Ok, that's getting to be in the right ballpark.
>
> But your profile is still odd.
>
>>
On Tue, Sep 3, 2013 at 2:34 PM, Linus Torvalds
wrote:
>
> I'll try to hack that up too, but it's looking like it really is just
> the "lock xadd", not the memory dependency chain..
Yeah, no difference: Better code generation with my quick hack for a
percpu spinlock:
│81078e70
On Tue, Sep 3, 2013 at 2:13 PM, Linus Torvalds
wrote:
>
> This is the one that actually compiles. Whether it *works* is still a
> total mystery.
It generates ok code, and it booted, so it seems to work at least for my config.
However, it seems to make no performance-difference what-so-ever,
On Tue, Sep 3, 2013 at 2:05 PM, Linus Torvalds
wrote:
>
> TOTALLY UNTESTED PATCH ATTACHED.
Actually, that was the previous (broken) version of that patch - I
hadn't regenerated it after fixing some stupid compile errors, and it
had the DECLARE parts wrong.
This is the one that actually
On 09/03/2013 03:09 PM, Linus Torvalds wrote:
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
wrote:
I suspect the tty_ldisc_lock() could be made to go away if we care.
Heh. I just pulled the tty patches from Greg, and the locking has
changed completely.
It may actually fix your AIM7
On Tue, Sep 3, 2013 at 12:19 PM, Ingo Molnar wrote:
>
> * Linus Torvalds wrote:
>
>>
>> - the lglock data structure isn't a percpu data structure, it's this
>> stupid global data structure that has a percpu pointer in it. So that
>> first "mov (%rdi),%rdx" is purely to load what is effectively
* Linus Torvalds wrote:
> On Tue, Sep 3, 2013 at 8:41 AM, Linus Torvalds
> wrote:
> >
> > I've done that, and it matches the PEBS runs, except obviously with
> > the instruction skew (so then depending on run it's 95% the
> > instruction after the xadd). So the PEBS profiles are entirely
> >
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
wrote:
>
> I suspect the tty_ldisc_lock() could be made to go away if we care.
Heh. I just pulled the tty patches from Greg, and the locking has
changed completely.
It may actually fix your AIM7 test-case, because while the global
spinlock remains
On Tue, Sep 3, 2013 at 8:41 AM, Linus Torvalds
wrote:
>
> I've done that, and it matches the PEBS runs, except obviously with
> the instruction skew (so then depending on run it's 95% the
> instruction after the xadd). So the PEBS profiles are entirely
> consistent with other data.
So one thing
On Tue, Sep 3, 2013 at 12:24 AM, Sedat Dilek wrote:
>
> Can someone summarize this thread (70+ postings)?
> Which patches are needed? And fixing what?
> ( Can people provide separate patches with a proper changelog? )
> Improvements?
The core lockref part is now merged and available in my git
On Tue, Sep 3, 2013 at 3:15 AM, Ingo Molnar wrote:
>
> One more thing to try would be a regular '-e cycles' non-PEBS run and see
> whether there's still largish overhead visible around that instruction.
I've done that, and it matches the PEBS runs, except obviously with
the instruction skew (so
On Tue, Sep 3, 2013 at 8:14 AM, Waiman Long wrote:
>
> Other than the global tty_ldisc_lock, there is no other major
> bottleneck. I am not that worry about the tty_ldisc_lock bottleneck
> as real world applications probably won't have that many calls to
> set the tty driver.
I suspect the
On 09/03/2013 02:01 AM, Ingo Molnar wrote:
* Waiman Long wrote:
Yes, that patch worked. It eliminated the lglock as a bottleneck in
the AIM7 workload. The lg_global_lock did not show up in the perf
profile, whereas the lg_local_lock was only 0.07%.
Just curious: what's the worst bottleneck
Hi!
> It is entirely possible that it is just a "cycles:pp" oddity - because
> the "lock xadd" is serializing, it can't retire until everything
> around it has been sorted out, and maybe it just shows up in profiles
> more than is really "fair" to the instruction itself, because it ends
> up
* Linus Torvalds wrote:
> On Mon, Sep 2, 2013 at 12:05 AM, Ingo Molnar wrote:
> >
> > The Haswell perf code isn't very widely tested yet as it took quite some
> > time to get it ready for upstream and thus got merged late, but on its
> > face this looks like a pretty good profile.
>
> Yes.
On Tue, Sep 3, 2013 at 8:01 AM, Ingo Molnar wrote:
>
> * Waiman Long wrote:
>
>> On 08/30/2013 10:42 PM, Al Viro wrote:
>> >On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote:
>> >
>> >>Aha... OK, I see what's going on. We end up with shm_mnt *not* marked
>> >>as long-living vfsmount,
* Waiman Long wrote:
> On 08/30/2013 10:42 PM, Al Viro wrote:
> >On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote:
> >
> >>Aha... OK, I see what's going on. We end up with shm_mnt *not* marked
> >>as long-living vfsmount, even though it lives forever. See if the
> >>following helps;
Hi!
It is entirely possible that it is just a cycles:pp oddity - because
the lock xadd is serializing, it can't retire until everything
around it has been sorted out, and maybe it just shows up in profiles
more than is really fair to the instruction itself, because it ends
up being that
On 09/03/2013 02:01 AM, Ingo Molnar wrote:
* Waiman Longwaiman.l...@hp.com wrote:
Yes, that patch worked. It eliminated the lglock as a bottleneck in
the AIM7 workload. The lg_global_lock did not show up in the perf
profile, whereas the lg_local_lock was only 0.07%.
Just curious: what's
On Tue, Sep 3, 2013 at 8:14 AM, Waiman Long waiman.l...@hp.com wrote:
Other than the global tty_ldisc_lock, there is no other major
bottleneck. I am not that worry about the tty_ldisc_lock bottleneck
as real world applications probably won't have that many calls to
set the tty driver.
I
On Tue, Sep 3, 2013 at 3:15 AM, Ingo Molnar mi...@kernel.org wrote:
One more thing to try would be a regular '-e cycles' non-PEBS run and see
whether there's still largish overhead visible around that instruction.
I've done that, and it matches the PEBS runs, except obviously with
the
On Tue, Sep 3, 2013 at 12:24 AM, Sedat Dilek sedat.di...@gmail.com wrote:
Can someone summarize this thread (70+ postings)?
Which patches are needed? And fixing what?
( Can people provide separate patches with a proper changelog? )
Improvements?
The core lockref part is now merged and
On Tue, Sep 3, 2013 at 8:41 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
I've done that, and it matches the PEBS runs, except obviously with
the instruction skew (so then depending on run it's 95% the
instruction after the xadd). So the PEBS profiles are entirely
consistent with
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
I suspect the tty_ldisc_lock() could be made to go away if we care.
Heh. I just pulled the tty patches from Greg, and the locking has
changed completely.
It may actually fix your AIM7 test-case, because while
* Linus Torvalds torva...@linux-foundation.org wrote:
On Tue, Sep 3, 2013 at 8:41 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
I've done that, and it matches the PEBS runs, except obviously with
the instruction skew (so then depending on run it's 95% the
instruction after
On Tue, Sep 3, 2013 at 12:19 PM, Ingo Molnar mi...@kernel.org wrote:
* Linus Torvalds torva...@linux-foundation.org wrote:
- the lglock data structure isn't a percpu data structure, it's this
stupid global data structure that has a percpu pointer in it. So that
first mov (%rdi),%rdx is
On 09/03/2013 03:09 PM, Linus Torvalds wrote:
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
I suspect the tty_ldisc_lock() could be made to go away if we care.
Heh. I just pulled the tty patches from Greg, and the locking has
changed completely.
It may
On Tue, Sep 3, 2013 at 2:05 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
TOTALLY UNTESTED PATCH ATTACHED.
Actually, that was the previous (broken) version of that patch - I
hadn't regenerated it after fixing some stupid compile errors, and it
had the DECLARE parts wrong.
This is the
On Tue, Sep 3, 2013 at 2:13 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
This is the one that actually compiles. Whether it *works* is still a
total mystery.
It generates ok code, and it booted, so it seems to work at least for my config.
However, it seems to make no
On Tue, Sep 3, 2013 at 2:34 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
I'll try to hack that up too, but it's looking like it really is just
the lock xadd, not the memory dependency chain..
Yeah, no difference: Better code generation with my quick hack for a
percpu spinlock:
On Sun, Sep 1, 2013 at 5:32 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
On Sun, Sep 1, 2013 at 3:01 AM, Sedat Dilek sedat.di...@gmail.com wrote:
Looks like this is now 10x faster: ~2.66Mloops (debug) VS.
~26.60Mloops (no-debug).
Ok, that's getting to be in the right ballpark.
On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
You're spending more time on the task stats than on the actual lookup.
Maybe you should turn off CONFIG_TASKSTATS..But why that whole
irq_return thing? Odd.
[ init/Kconfig ]
...
config TASKSTATS
bool
On Tue, Sep 3, 2013 at 5:14 PM, Waiman Long waiman.l...@hp.com wrote:
On 09/03/2013 02:01 AM, Ingo Molnar wrote:
* Waiman Longwaiman.l...@hp.com wrote:
Yes, that patch worked. It eliminated the lglock as a bottleneck in the
AIM7 workload. The lg_global_lock did not show up in the perf
On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones da...@redhat.com wrote:
On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
You're spending more time on the task stats than on the actual lookup.
Maybe you should turn off CONFIG_TASKSTATS..But why that whole
irq_return thing?
On Wed, Sep 4, 2013 at 12:41 AM, Sedat Dilek sedat.di...@gmail.com wrote:
On Tue, Sep 3, 2013 at 5:14 PM, Waiman Long waiman.l...@hp.com wrote:
On 09/03/2013 02:01 AM, Ingo Molnar wrote:
* Waiman Longwaiman.l...@hp.com wrote:
Yes, that patch worked. It eliminated the lglock as a bottleneck
On Wed, Sep 04, 2013 at 01:05:38AM +0200, Sedat Dilek wrote:
On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones da...@redhat.com wrote:
On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
You're spending more time on the task stats than on the actual lookup.
Maybe you should
On Wed, Sep 4, 2013 at 1:15 AM, Dave Jones da...@redhat.com wrote:
On Wed, Sep 04, 2013 at 01:05:38AM +0200, Sedat Dilek wrote:
On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones da...@redhat.com wrote:
On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote:
You're spending more
* Waiman Long waiman.l...@hp.com wrote:
On 08/30/2013 10:42 PM, Al Viro wrote:
On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote:
Aha... OK, I see what's going on. We end up with shm_mnt *not* marked
as long-living vfsmount, even though it lives forever. See if the
following
On Tue, Sep 3, 2013 at 8:01 AM, Ingo Molnar mi...@kernel.org wrote:
* Waiman Long waiman.l...@hp.com wrote:
On 08/30/2013 10:42 PM, Al Viro wrote:
On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote:
Aha... OK, I see what's going on. We end up with shm_mnt *not* marked
as
* Linus Torvalds torva...@linux-foundation.org wrote:
On Mon, Sep 2, 2013 at 12:05 AM, Ingo Molnar mi...@kernel.org wrote:
The Haswell perf code isn't very widely tested yet as it took quite some
time to get it ready for upstream and thus got merged late, but on its
face this looks like
On 08/30/2013 10:42 PM, Al Viro wrote:
On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote:
Aha... OK, I see what's going on. We end up with shm_mnt *not* marked
as long-living vfsmount, even though it lives forever. See if the
following helps; if it does (and I very much expect it to),
On Mon, Sep 2, 2013 at 12:05 AM, Ingo Molnar wrote:
>
> The Haswell perf code isn't very widely tested yet as it took quite some
> time to get it ready for upstream and thus got merged late, but on its
> face this looks like a pretty good profile.
Yes. And everything else looks fine too.
On 9/2/13 4:30 AM, Sedat Dilek wrote:
On Sun, Sep 1, 2013 at 5:55 PM, Linus Torvalds
wrote:
On Sun, Sep 1, 2013 at 8:45 AM, Sedat Dilek wrote:
Samples: 160K of event 'cycles:pp', Event count (approx.): 77003901089
+ 12,46% t_lockref_from- [kernel.kallsyms] [k] irq_return
+ 4,86%
On Sun, Sep 1, 2013 at 5:55 PM, Linus Torvalds
wrote:
> On Sun, Sep 1, 2013 at 8:45 AM, Sedat Dilek wrote:
>>
>> Samples: 160K of event 'cycles:pp', Event count (approx.): 77003901089
>> + 12,46% t_lockref_from- [kernel.kallsyms] [k] irq_return
>> + 4,86% t_lockref_from-
* Linus Torvalds wrote:
> On Sun, Sep 1, 2013 at 5:12 PM, Linus Torvalds
> wrote:
> >
> > It *is* one of the few locked accesses remaining, and it's clearly
> > getting called a lot (three calls per system call: two mntput's - one
> > for the root path, one for the result path, and one from
* Linus Torvalds torva...@linux-foundation.org wrote:
On Sun, Sep 1, 2013 at 5:12 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
It *is* one of the few locked accesses remaining, and it's clearly
getting called a lot (three calls per system call: two mntput's - one
for the
On Sun, Sep 1, 2013 at 5:55 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
On Sun, Sep 1, 2013 at 8:45 AM, Sedat Dilek sedat.di...@gmail.com wrote:
Samples: 160K of event 'cycles:pp', Event count (approx.): 77003901089
+ 12,46% t_lockref_from- [kernel.kallsyms] [k] irq_return
On 9/2/13 4:30 AM, Sedat Dilek wrote:
On Sun, Sep 1, 2013 at 5:55 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
On Sun, Sep 1, 2013 at 8:45 AM, Sedat Dilek sedat.di...@gmail.com wrote:
Samples: 160K of event 'cycles:pp', Event count (approx.): 77003901089
+ 12,46% t_lockref_from-
On Mon, Sep 2, 2013 at 12:05 AM, Ingo Molnar mi...@kernel.org wrote:
The Haswell perf code isn't very widely tested yet as it took quite some
time to get it ready for upstream and thus got merged late, but on its
face this looks like a pretty good profile.
Yes. And everything else looks fine
On 08/30/2013 10:42 PM, Al Viro wrote:
On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote:
Aha... OK, I see what's going on. We end up with shm_mnt *not* marked
as long-living vfsmount, even though it lives forever. See if the
following helps; if it does (and I very much expect it to),
On Sun, Sep 1, 2013 at 5:12 PM, Linus Torvalds
wrote:
>
> It *is* one of the few locked accesses remaining, and it's clearly
> getting called a lot (three calls per system call: two mntput's - one
> for the root path, one for the result path, and one from path_init ->
> rcu_walk_init), but with
On Sun, Sep 1, 2013 at 4:30 PM, Al Viro wrote:
>
> Hrm... It excludes sharing between the locks, all right. AFAICS, that
> won't exclude sharing with plain per-cpu vars, will it?
Yes it will. DEFINE_PER_CPU_SHARED_ALIGNED not only aligns the data,
it also puts it in a separate section with
On Sun, Sep 01, 2013 at 03:48:01PM -0700, Linus Torvalds wrote:
> I made DEFINE_LGLOCK use DEFINE_PER_CPU_SHARED_ALIGNED for the
> spinlock, so that each local lock gets its own cacheline, and the
> total loops jumped to 62M (from 52-54M before). So when I looked at
> the numbers, I thought "oh,
On Sun, Sep 1, 2013 at 3:44 PM, Al Viro wrote:
>
> GRRR... I see something else:
> void file_sb_list_del(struct file *file)
> {
> if (!list_empty(>f_u.fu_list)) {
> lg_local_lock_cpu(_lglock, file_list_cpu(file));
> list_del_init(>f_u.fu_list);
>
1 - 100 of 288 matches
Mail list logo