In message [EMAIL PROTECTED] you wrote
:
+ } fpvsr __attribute__((aligned(16)));
Do we really need a union here? what would happen if you just
changed
the type of fpr[32] from double to vector if #CONFIG_VSX?
I really dont like the union and think we can just make the storage
On Jun 19, 2008, at 1:01 AM, Michael Neuling wrote:
In message B0E87874-BC65-4037-
[EMAIL PROTECTED] you wrote
:
+ } fpvsr __attribute__((aligned(16)));
Do we really need a union here? what would happen if you just
changed
the type of fpr[32] from double to vector if #CONFIG_VSX?
I
On May 29, 2008, at 1:20 AM, Michael Ellerman wrote:
We currently have a few routines for patching code in asm/system.h,
because
they didn't fit anywhere else. I'd like to clean them up a little
and add
some more, so first move them into a dedicated C file - they don't
need to
be
On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
{ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
Should that be ibm,vsx?
--
dwmw2
On Thu, 2008-06-19 at 01:15 -0500, Kumar Gala wrote:
On May 29, 2008, at 1:20 AM, Michael Ellerman wrote:
We currently have a few routines for patching code in asm/system.h,
because
they didn't fit anywhere else. I'd like to clean them up a little
and add
some more, so first move
On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
{ibm,vmx, 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ {ibm,vmx, 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
Should that be ibm,vsx?
Nope
/*
* Copyright (C) 2008 Gunnar von Boehn, IBM Corp.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later
/*
* Copyright (C) 2008 Gunnar von Boehn, IBM Corp.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later
On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
I still think using the union makes it is easier to read than what you
have here. Also, it better reflects the structure of what's being
stored there.
I don't think that holds much weight with me. We don't union the
vector128 type
* Nathan Lynch [EMAIL PROTECTED] wrote:
There is an interesting quality of POWER6 cores, which each have 2
hardware threads: assuming one thread on the core is idle, the primary
thread is a little faster than the secondary thread. To illustrate:
for cpumask in 0x1 0x2 ; do
taskset
On Thursday 19 June 2008, Mark Nelson wrote:
The plan is to use Michael Ellerman's code patching work so that at runtime
if we're running on a Cell machine the new routines are called but otherwise
the existing memory copy routines are used.
Have you tried running this code on other platforms
Arnd Bergmann writes:
Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6,
Why would you guess that?
Paul.
___
Hi Arnd,
I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
On PPC-970 the CELL memcpy is faster than the current Linux routine.
This becomes really visible when you really copy memory-to-memory and are
not only working in the 2ndlevelcache.
Kind regards
Gunnar von Boehn
On Jun 19, 2008, at 1:55 AM, Michael Ellerman wrote:
On Thu, 2008-06-19 at 01:15 -0500, Kumar Gala wrote:
On May 29, 2008, at 1:20 AM, Michael Ellerman wrote:
We currently have a few routines for patching code in asm/system.h,
because
they didn't fit anywhere else. I'd like to clean them up
On Jun 19, 2008, at 4:33 AM, Benjamin Herrenschmidt wrote:
On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
I still think using the union makes it is easier to read than what
you
have here. Also, it better reflects the structure of what's being
stored there.
I don't think that holds
On Thursday 19 June 2008, Paul Mackerras wrote:
Arnd Bergmann writes:
Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6,
Why would you guess that?
On Wed, 11 Jun 2008 10:50:31 +1000
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
This is some preliminary work to improve TLB management on SW loaded
TLB powerpc platforms. This introduce support for non-atomic PTE
operations in pgtable-ppc32.h and removes write back to the PTE from
the TLB
Hi Chirag
On Thu, 19 Jun 2008 18:16:34 +0530 Chirag Jog [EMAIL PROTECTED] wrote:
Hi,
I was trying out the realtime linux kernel 2.6.25.4-rt3 on a powerpc box.
The kernel booted fine.
On running the matrix_mult testcase from the real-time testsuite
in ltp
On Thursday 19 June 2008, Mark Nelson wrote:
* __copy_tofrom_user routine optimized for CELL-BE-PPC
A few things I noticed:
* You don't have a page wise user copy, which the regular code
has. This is probably not so noticable in iperf, but should
have a significant impact on lmbench and on a
On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:
I assume it has suffered from bitrot and nobody tried to do better
since the Power3 days. AFAICT, it hasn't seen any update since your
original Power4 version from 2002.
I've got an out-of-tree optimized version for pa6t as well that I
Hi Arnd,
You don't have a page wise user copy,
which the regular code has.
The new code does not need two version IMHO.
The regular code was much slower for the normal case and has a special
version for the 4K optimized case.
The new code is equally good in both cases, so adding an extra 4K
--- On Thu, 6/19/08, Gunnar von Boehn [EMAIL PROTECTED] wrote:
You are right the main copy2user requires that the SRC is
cacheable.
IMHO because of the exception on load, the routine should
fallback to the
byte copy loop.
Arnd, could you verify that it works on localstore?
Since the
Folks,
I've pushed out a freshly tagged DTC 1.2.0-rc1 to jdl.com.
Please feel free to test it!
Thanks,
jdl
David Gibson (34):
libfdt: Add and use a node iteration helper function.
libfdt: Fix NOP handling bug in fdt_add_subnode_namelen()
dtc: Fold comment handling test into
Ingo Molnar wrote:
* Nathan Lynch [EMAIL PROTECTED] wrote:
So it would be nice to have the scheduler slightly prefer primary
threads on POWER6 machines. These patches, which allow the
architecture to override the scheduler's CPU power calculation, are
one possible approach, but I'm
This patch adds support for the power button on future IBM cell blades.
It actually doesn't shut down the machine. Instead it exposes an
input device /dev/input/event0 to userspace which sends KEY_POWER
if power button has been pressed.
haldaemon actually recognizes the button, so a plattform
Hi Paul,
Please pull from:
master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx.git next
to get some more changes for 2.6.27. A new board port, a revert, and a
few fixes.
I'll have a few more after this as well, most notably Ben's rework
patch.
josh
Giuseppe Coviello (2):
On Wed, 18 Jun 2008 22:45:57 +0400
Matvejchikov Ilya [EMAIL PROTECTED] wrote:
I'm glad that you have corrected it. Half a year ago I pointed out
that there was such a mistake:
http://patchwork.ozlabs.org/linuxppc/patch?id=10700
You've used -embedded ML, and patch wasn't noticed... I can add
Vitaly Bordug wrote:
On Wed, 18 Jun 2008 22:45:57 +0400
Matvejchikov Ilya [EMAIL PROTECTED] wrote:
I'm glad that you have corrected it. Half a year ago I pointed out
that there was such a mistake:
http://patchwork.ozlabs.org/linuxppc/patch?id=10700
You've used -embedded ML, and patch wasn't
On Jun 19, 2008, at 1:47 PM, Jon Loeliger wrote:
We should merge the -embedded list into -dev
and retire the -embedded list finally.
I used to be an opponent to this given the amount of help my board
doesn't work questions on -embedded, but the volume isn't that great,
and much lower than
Yes, please. =)
2008/6/19 Vitaly Bordug [EMAIL PROTECTED]:
On Wed, 18 Jun 2008 22:45:57 +0400
Matvejchikov Ilya [EMAIL PROTECTED] wrote:
I'm glad that you have corrected it. Half a year ago I pointed out
that there was such a mistake:
http://patchwork.ozlabs.org/linuxppc/patch?id=10700
On Thursday 19 June 2008, Mark Nelson wrote:
.align 7
_GLOBAL(copy_4K_page)
dcbt0,r4/* Prefetch ONE SRC cacheline */
addir6,r3,-8/* prepare for stdu */
addir4,r4,-8/* prepare for ldu */
li r10,32
During corner case testing, we noticed that some versions of ehca
do not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI,
if eqes are pending.
Signed-off-by: Stefan Roscher [EMAIL PROTECTED]
---
Gunnar von Boehn writes:
I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
On PPC-970 the CELL memcpy is faster than the current Linux routine.
This becomes really visible when you really copy memory-to-memory and are
not only working in the 2ndlevelcache.
Could you send
On Thu, 19 Jun 2008 09:53:16 pm Arnd Bergmann wrote:
On Thursday 19 June 2008, Mark Nelson wrote:
The plan is to use Michael Ellerman's code patching work so that at runtime
if we're running on a Cell machine the new routines are called but otherwise
the existing memory copy routines are
On Fri, 20 Jun 2008 12:53:49 am Olof Johansson wrote:
On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:
I assume it has suffered from bitrot and nobody tried to do better
since the Power3 days. AFAICT, it hasn't seen any update since your
original Power4 version from 2002.
I've got
Gunnar von Boehn writes:
The regular code was much slower for the normal case and has a special
version for the 4K optimized case.
That's a slightly inaccurate view...
The reason for having the two cases is that when I profiled the
distribution of sizes and alignments of memory copies in the
* The naming of the labels (with just numbers) is rather confusing,
it would be good to have something better, but I must admit that
I don't have a good idea either.
I will admit that at first glance the label naming with numbers
does look confusing but when you notice that all the loads start
On Fri, 20 Jun 2008 07:28:50 am Arnd Bergmann wrote:
On Thursday 19 June 2008, Mark Nelson wrote:
.align 7
_GLOBAL(copy_4K_page)
dcbt0,r4/* Prefetch ONE SRC cacheline */
addir6,r3,-8/* prepare for stdu */
addir4,r4,-8
On Wed, 18 Jun 2008, Laurent Pinchart wrote:
The restart() function is called when the link state changes and resets
multicast and promiscous settings. This patch restores those settings at the
end of restart().
Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
---
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7. Includes context switch, ptrace and signals support.
Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---
Paulus: please consider for your 2.6.27 tree.
Updated with comments from Kumar, Milton, Dave Woodhouse
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---
arch/powerpc/kernel/head_64.S |4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index:
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---
arch/powerpc/kernel/align.c |
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad. Also
when we add VSX here in a later patch, we can hit two of these at the
same time.
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---
arch/powerpc/kernel/fpu.S|2 +-
arch/powerpc/kernel/head_32.S|6 --
arch/powerpc/kernel/head_64.S|8 +---
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
VSR[0] | FPR[0]|
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC.
Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---
arch/powerpc/platforms/Kconfig.cputype | 16
1 file changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling [EMAIL PROTECTED]
Signed-off-by: Joel Schopp [EMAIL PROTECTED]
---
arch/powerpc/kernel/prom.c |4
include/asm-powerpc/cputable.h | 15 ++-
2 files
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling [EMAIL PROTECTED]
---
include/asm-powerpc/ppc_asm.h | 127
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will
49 matches
Mail list logo