Re: Help on memchr() EGLIBC assembly code

2009-07-31 Thread Richard Henderson

On 07/30/2009 09:29 AM, Aurelien Jarno wrote:

Thanks for this patch I have tried it, and it does not have the original
problem I have reported. Unfortunately it does not pass the glibc
testsuite. I'll try to debug the problem later (I don't own an alpha
machine, and need to have internet access to debug it remotely).



This one passes the test suite (on x86, with replacements for the alpha 
builtins).



r~
typedef unsigned long word;

#ifdef __alpha__
#define cmpbeq0(X)	__builtin_alpha_cmpbge(0, (X))
#define extql		__builtin_alpha_extql
#define extqh		__builtin_alpha_extqh
#define insbl		__builtin_alpha_insbl
#define zap		__builtin_alpha_zap
#else
static word
cmpbeq0(word y)
{
  word i, r = 0;
  for (i = 0; i  8; ++i)
{
  unsigned char yc = (y  i*8);
  if (yc == 0)
	r |= 1  i;
}
  return r;
}

static word
extql(word x, word y)
{
  return x  ((y  7) * 8);
}

static word
extqh(word x, word y)
{
  return x  (64 - ((y  7) * 8));
}

static word
zap(word x, word y)
{
  word i, mask = 0;
  for (i = 0; i  8; ++i)
if (y  (1  i))
  mask |= (word)0xff  (i * 8);
  return x  ~mask;
}

static word
insbl(word x, word y)
{
  word byte_mask = 1ul  (y  7);
  word byte_loc = (y  7) * 8;
  return zap (x  byte_loc, ~byte_mask);
}
#endif

static inline word
ldq_u(word s)
{
  return *(const word *)(s  -8);
}

#define unlikely(X)	__builtin_expect ((X), 0)
#define prefetch(X)	__builtin_prefetch ((void *)(X), 0)

#define find(X, Y)	cmpbeq0 ((X) ^ (Y))

void *
memchr (const void *xs, int xc, word n)
{
  const word *s_align;
  const word s = (word) xs;
  word t, current, found;

  if (unlikely (n == 0))
return 0;

  current = ldq_u (s);

  /* Replicate low byte of XC into all bytes of C.  */
  t = insbl (xc, 1);			/* 00c0 */
  t = (xc  0xff) | t;			/* 00cc */
  t = (t  16) | t;			/*  */
  const word c = (t  32) | t;		/*  */

  /* When N = 8, we can perform the entire comparison in one go.
 Load the N bytes into the low part of CURRENT, via an unaligned
 quadword load sequence, and treat it as the last aligned word read.  */
  if (unlikely (n = 8))
{
  /* Tweak the standard unaligned quadword load sequence by issuing
	 the second ldq_u at (s + n - 1) instead of (s + 8 - 1).  This
	 avoids crossing a page boundary when S+N doesn't.  */
  word last = extqh (ldq_u (s + n - 1), s);
  word first = extql (current, s);
  current = first | last;
  s_align = (const word *) s;
  goto last_quad;
}

  /* Align the source, and decrement the count by the number
 of bytes searched in the first word.  */
  s_align = (const word *)(s  -8);
  n += (s  7);

  /* Deal with misalignment in the first word for the comparison.  */
  {
word mask = -1ul  (s  7);
found = find (current, c)  mask;
if (unlikely (found))
  goto found_it;
  }

  s_align++;
  n -= 8;

  /* If the block is sufficiently large, align to cacheline and prefetch.  */
  if (unlikely (n = 256))
{
  /* Prefetch 3 cache lines beyond the one we're working on.  */
  prefetch (s_align + 8);
  prefetch (s_align + 16);
  prefetch (s_align + 24);

  while ((word)s_align  63)
	{
	  current = *s_align;
	  found = find (current, c);
	  if (found)
	goto found_it;
	  s_align++;
	  n -= 8;
	}

	/* Within each cacheline, advance the load for the next word
	   before the test for the previous word is complete.  This
	   allows us to hide the 3 cycle L1 cache load latency.  We
	   only perform this advance load within a cacheline to prevent
	   reading across page boundary.  */
#define CACHELINE_LOOP\
	do {	\
	  word i, next = s_align[0];		\
	  for (i = 0; i  7; ++i)		\
	{	\
	  current = next;			\
	  next = s_align[1];		\
	  found = find (current, c);	\
	  if (unlikely (found))		\
		goto found_it;			\
	  s_align++;			\
	}	\
	  current = next;			\
	  found = find (current, c);		\
	  if (unlikely (found))			\
	goto found_it;			\
	  s_align++;\
	  n -= 64;\
	} while (0)
  
  /* While there's still lots more data to potentially be read,
	 continue issuing prefetches for the 4th cacheline out.  */
  while (n = 256)
	{
	  prefetch (s_align + 24);
	  CACHELINE_LOOP;
	}

  /* Up to 3 cache lines remaining.  Continue issuing advanced
	 loads, but stop prefetching.  */
  while (n = 64)
	CACHELINE_LOOP;
}

  /* Quadword aligned loop.  */
  current = *s_align;
  while (n  8)
{
  found = find (current, c);
  if (unlikely (found))
	goto found_it;
  current = *++s_align;
  n -= 8;
}

 last_quad:
  {
word mask = -1ul  n;
found = find (current, c)  ~mask;
if (found == 0)
  return 0;
  }

 found_it:
  {
word offset;

#ifdef __alpha_cix__
offset = __builtin_alpha_cttz (found);
#else
/* Extract LSB.  */
found = -found;

/* Binary search for the LSB.  */
offset  = (found  0x0f ? 0 : 4);
offset += (found  0x33 ? 0 : 2);
offset += (found  

Re: Help on memchr() EGLIBC assembly code

2009-07-30 Thread Aurelien Jarno
On Wed, Jul 29, 2009 at 05:24:59PM -0700, Richard Henderson wrote:
 On 07/26/2009 04:45 PM, Aurelien Jarno wrote:
 Knowing that $31 could be used for prefetch, I have modified the
 assembly code from memchr.S to use it. It passes all the testsuite.


 This isn't intended to be a prefetch instruction, it's
 meant to be fetching the data for the next word.  I.e.
 we've unrolled the loop and there's at least 8 bytes
 left in the search.

 Note the

 # At least two quads remain to be accessed.

 comment.  At that point we're supposed to be more
 than 16 bytes away from the end of the input buffer.

 Actually, the confusion I see is farther upthread:

   The problem is that the memchr() function on alpha uses  
 prefetch, which
   can cause a page boundary to be crossed, while the standards  
 (POSIX and
   C99) says it should stop when a match is found.

 I didn't realize this when I wrote the function.

 The entire function should be rewritten, since there's little
 point in using a prefetch instruction that close to the load.
 Prefetch instructions should be used to move data into the L1
 cache, not hide the 3 cycle load delay.  Thus a prefetch, if
 used, should be several cache lines ahead, not just a single word.

 Perhaps a better solution would be to read words until we get
 cacheline aligned, then read the entire line into 8 registers,
 prefetch the next line, then process each register one by one.

 Try this.


Thanks for this patch I have tried it, and it does not have the original
problem I have reported. Unfortunately it does not pass the glibc
testsuite. I'll try to debug the problem later (I don't own an alpha
machine, and need to have internet access to debug it remotely).

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Help on memchr() EGLIBC assembly code

2009-07-29 Thread Richard Henderson

On 07/26/2009 04:45 PM, Aurelien Jarno wrote:

Knowing that $31 could be used for prefetch, I have modified the
assembly code from memchr.S to use it. It passes all the testsuite.



This isn't intended to be a prefetch instruction, it's
meant to be fetching the data for the next word.  I.e.
we've unrolled the loop and there's at least 8 bytes
left in the search.

Note the

# At least two quads remain to be accessed.

comment.  At that point we're supposed to be more
than 16 bytes away from the end of the input buffer.

Actually, the confusion I see is farther upthread:

  The problem is that the memchr() function on alpha uses 
prefetch, which
  can cause a page boundary to be crossed, while the standards 
(POSIX and

  C99) says it should stop when a match is found.


I didn't realize this when I wrote the function.

The entire function should be rewritten, since there's little
point in using a prefetch instruction that close to the load.
Prefetch instructions should be used to move data into the L1
cache, not hide the 3 cycle load delay.  Thus a prefetch, if
used, should be several cache lines ahead, not just a single word.

Perhaps a better solution would be to read words until we get
cacheline aligned, then read the entire line into 8 registers,
prefetch the next line, then process each register one by one.

Try this.


r~
typedef unsigned long word;

#define ldq(X)		(*(const word *)(X))
#define ldq_u(X)	(*(const word *)((X)  -8))

#define cmpbge		__builtin_alpha_cmpbge
#define extql		__builtin_alpha_extql
#define extqh		__builtin_alpha_extqh
#define insbl		__builtin_alpha_insbl
#define insqh		__builtin_alpha_insqh
#define zap		__builtin_alpha_zap

#define unlikely(X)	__builtin_expect ((X), 0)
#define prefetch(X)	__builtin_prefetch ((void *)(X), 0)

#define find(X, Y)	cmpbge (0, (X) ^ (Y))

void *memchr (const void *xs, int xc, word n)
{
  word s = (word)xs;
  word c;
  word current, found;
  word s_align;

  if (unlikely (n == 0))
return 0;

  current = ldq_u (s);

  /* Replicate low byte of C into all bytes.  */
  {
word t = insbl (xc, 1);		/* 00c0 */
c = (xc  0xff) | t;		/* 00cc */
c = (c  16) | c;			/*  */
c = (c  32) | c;			/*  */
  }

  if (unlikely (n  9))
goto only_quad;

  /* Align the source, and decrement the count by the number
 of bytes searched in the first word.  */
  s_align = s  -8;
  n += (s  7);
  n -= 8;

  /* Deal with misalignment in the first word for the comparison.  */
  {
word mask = insqh (-1, s);
found = cmpbge (0, (current ^ c) | mask);
if (found)
  goto found_it;
  }

  s_align += 8;

  /* If the block is sufficiently large, align to cacheline (minimum 64-bytes),
 prefetch the next line, and read entire cachelines at a time.  */
  if (unlikely (n = 256))
{
  prefetch (s_align + 64);
  while (s_align  63)
	{
	  current = ldq (s_align);
	  found = find (current, c);
	  if (found)
	goto found_it;
	  s_align += 8;
	  n -= 8;
	}

  while (n  64)
	{
	  word c0, c1, c2, c3, c4, c5, c6, c7;

	  prefetch (s_align + 64);
	  c0 = ldq (s_align + 0*8);
	  c1 = ldq (s_align + 1*8);
	  c2 = ldq (s_align + 2*8);
	  c3 = ldq (s_align + 3*8);
	  c4 = ldq (s_align + 4*8);
	  c5 = ldq (s_align + 5*8);
	  c6 = ldq (s_align + 6*8);
	  c7 = ldq (s_align + 7*8);

	  found = find (c0, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;

	  found = find (c1, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;

	  found = find (c2, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;

	  found = find (c3, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;

	  found = find (c4, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;

	  found = find (c5, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;

	  found = find (c6, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;

	  found = find (c7, c);
	  if (unlikely (found))
	goto found_it;
	  s_align += 8;
	  n -= 64;
	}
}

  /* Quadword aligned loop.  */
  while (1)
{
  current = ldq (s_align);
  if (n  8)
	goto last_quad;
  found = find (current, c);
  if (found)
	goto found_it;

  s_align += 8;
  n -= 8;
}

 only_quad:
  {
word end = zap (n, 0x80) - 1;
word last = extqh (ldq_u (s + end), s);
word first = extql (current, s);
current = first | last;

/* We've read one word and aligned it in the register.  Thus the 
   bit offset in current is relative to S.  */
s_align = s;
  }

 last_quad:
  {
word mask = (-1ul  -n); 
found = find (current, c)  mask;
if (found == 0)
  return 0;
  }

 found_it:
  {
word offset;

#ifdef __alpha_cix__
offset = __builtin_alpha_cttz (found);
#else
/* Extract LSB.  */
found = -found;

/* Binary search for the LSB.  */
offset  = (found  0x0f ? 0 : 4);
offset += (found  0x33 ? 0 : 2);
offset += (found  0x55 ? 0 

Re: Help on memchr() EGLIBC assembly code

2009-07-27 Thread Aurelien Jarno
On Mon, Jul 27, 2009 at 01:45:06AM +0200, Aurelien Jarno wrote:
 On Sun, Jul 19, 2009 at 04:29:33PM +0200, Aurelien Jarno wrote:
  On Wed, Jul 15, 2009 at 12:48:02PM -0700, Richard Henderson wrote:
   On 07/13/2009 03:16 PM, Matt Turner forwarded:
   The problem is that the memchr() function on alpha uses prefetch, 
   which
   can cause a page boundary to be crossed, while the standards (POSIX 
   and
   C99) says it should stop when a match is found.
  
   That's not supposed to matter -- faults from prefetch are supposed to be  
   ignored; see do_page_fault:
  
  The problem is that the prefech is not done with $31, but using $1 and
  $3. It is called prefetch in the code, but it is more like read a value
  in advance.
  
 
 Knowing that $31 could be used for prefetch, I have modified the
 assembly code from memchr.S to use it. It passes all the testsuite.
 
 Comments are welcome. Then I'll do the alphaev6 version.

Here is the alphaev6 version:

--- a/sysdeps/alpha/alphaev6/memchr.S
+++ b/sysdeps/alpha/alphaev6/memchr.S
@@ -127,7 +127,7 @@ $first_quad:
 cmpbge  $31, $1, $2# E :
 bne $2, $found_it  # U :
# At least one byte left to process.
-   ldq $1, 8($0)   # L :
+   ldq $31, 8($0)  # L :
subq$5, 1, $18  # E : U L U L
 
addq$0, 8, $0   # E :
@@ -143,38 +143,38 @@ $first_quad:
and $4, 8, $4   # E : odd number of quads?
bne $4, $odd_quad_count # U :
# At least three quads remain to be accessed
-   mov $1, $4  # E : L U L U : move prefetched value to 
correct reg
+   nop # E : L U L U : move prefetched value to 
correct reg
 
.align  4
 $unrolled_loop:
-   ldq $1, 8($0)   # L : prefetch $1
-   xor $17, $4, $2 # E :
-   cmpbge  $31, $2, $2 # E :
-   bne $2, $found_it   # U : U L U L
+   ldq $1, 0($0)   # L : load quad
+   xor $17, $1, $2 # E :
+   ldq $31, 8($0)  # L : prefetch next quad
+   cmpbge  $31, $2, $2 # E : U L U L
 
+   bne $2, $found_it   # U :
addq$0, 8, $0   # E :
nop # E :
nop # E :
-   nop # E :
 
 $odd_quad_count:
+   ldq $1, 0($0)   # L : load quad
xor $17, $1, $2 # E :
-   ldq $4, 8($0)   # L : prefetch $4
+   ldq $31, 8($0)  # L : prefetch $4
cmpbge  $31, $2, $2 # E :
-   addq$0, 8, $6   # E :
 
+   addq$0, 8, $6   # E :
bne $2, $found_it   # U :
cmpult  $6, $18, $6 # E :
addq$0, 8, $0   # E :
-   nop # E :
 
bne $6, $unrolled_loop # U :
-   mov $4, $1  # E : move prefetched value into $1
nop # E :
nop # E :
-
-$final:subq$5, $0, $18 # E : $18 - number of bytes left to do
nop # E :
+
+$final:ldq $1, 0($0)   # L : load last quad
+   subq$5, $0, $18 # E : $18 - number of bytes left to do
nop # E :
bne $18, $last_quad # U :
 


-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Help on memchr() EGLIBC assembly code

2009-07-26 Thread Aurelien Jarno
On Sun, Jul 19, 2009 at 04:29:33PM +0200, Aurelien Jarno wrote:
 On Wed, Jul 15, 2009 at 12:48:02PM -0700, Richard Henderson wrote:
  On 07/13/2009 03:16 PM, Matt Turner forwarded:
  The problem is that the memchr() function on alpha uses prefetch, which
  can cause a page boundary to be crossed, while the standards (POSIX and
  C99) says it should stop when a match is found.
 
  That's not supposed to matter -- faults from prefetch are supposed to be  
  ignored; see do_page_fault:
 
 The problem is that the prefech is not done with $31, but using $1 and
 $3. It is called prefetch in the code, but it is more like read a value
 in advance.
 

Knowing that $31 could be used for prefetch, I have modified the
assembly code from memchr.S to use it. It passes all the testsuite.

Comments are welcome. Then I'll do the alphaev6 version.

diff --git a/sysdeps/alpha/memchr.S b/sysdeps/alpha/memchr.S
index 5d713d5..87c7fb1 100644
--- a/sysdeps/alpha/memchr.S
+++ b/sysdeps/alpha/memchr.S
@@ -119,7 +119,7 @@ $first_quad:
 
# At least one byte left to process.
 
-   ldq t0, 8(v0)   # e0:
+   ldq zero, 8(v0) # e0: prefetch next quad
subqt4, 1, a2   # .. e1 :
addqv0, 8, v0   #-e0:
 
@@ -138,19 +138,19 @@ $first_quad:
 
# At least three quads remain to be accessed
 
-   mov t0, t3  # e0: move prefetched value to correct reg
-
.align  4
 $unrolled_loop:
-   ldq t0, 8(v0)   #-e0: prefetch t0
-   xor a1, t3, t1  # .. e1 :
-   cmpbge  zero, t1, t1# e0:
-   bne t1, $found_it   # .. e1 :
+   ldq t0, 0(v0)   # e0: load quad
+   xor a1, t0, t1  # .. e1 :
+   ldq zero, 8(v0) # e0: prefetch next quad
+   cmpbge  zero, t1, t1# .. e1:
+   bne t1, $found_it   # e0:
 
-   addqv0, 8, v0   #-e0:
+   addqv0, 8, v0   #e1 :
 $odd_quad_count:
+   ldq t0, 0(v0)   # e0: load quad
xor a1, t0, t1  # .. e1 :
-   ldq t3, 8(v0)   # e0: prefetch t3
+   ldq zero, 8(v0) # e0: prefetch next quad
cmpbge  zero, t1, t1# .. e1 :
addqv0, 8, t5   #-e0:
bne t1, $found_it   # .. e1 :
@@ -159,8 +159,8 @@ $odd_quad_count:
addqv0, 8, v0   # .. e1 :
bne t5, $unrolled_loop #-e1 :
 
-   mov t3, t0  # e0: move prefetched value into t0
-$final:subqt4, v0, a2  # .. e1 : a2 - number of bytes left to 
do
+$final:ldq t0, 0(v0)   # e0: load last quad
+   subqt4, v0, a2  # .. e1 : a2 - number of bytes left to do
bne a2, $last_quad  # e1:
 
 $not_found:

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Help on memchr() EGLIBC assembly code

2009-07-15 Thread Richard Henderson

On 07/13/2009 03:16 PM, Matt Turner forwarded:

The problem is that the memchr() function on alpha uses prefetch, which
can cause a page boundary to be crossed, while the standards (POSIX and
C99) says it should stop when a match is found.


That's not supposed to matter -- faults from prefetch are supposed to be 
ignored; see do_page_fault:


/* As of EV6, a load into $31/$f31 is a prefetch, and never faults
   (or is suppressed by the PALcode).  Support that for older CPUs
   by ignoring such an instruction.  */
if (cause == 0) {
unsigned int insn;
__get_user(insn, (unsigned int __user *)regs-pc);
if ((insn  21  0x1f) == 0x1f 
/* ldq ldl ldt lds ldg ldf ldwu ldbu */
(1ul  (insn  26)  0x30f1400ul)) {
regs-pc += 4;
return;
}
}

Can you figure out why that kernel code isn't working?  I no longer have 
working alpha hw...



r~


--
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Help on memchr() EGLIBC assembly code

2009-07-13 Thread Aurelien Jarno
Hi all,

With a lot of patches (E)GLIBC 2.10 builds on alpha, but it fails on the
testsuite for the memchr() function, which is an optimized assembly code
on alpha. Unfortunately I don't speak alpha assembly very well, so help
is needed.

The problem is that the memchr() function on alpha uses prefetch, which
can cause a page boundary to be crossed, while the standards (POSIX and
C99) says it should stop when a match is found.

I have built a small testcase (see file attached), which contains the
code to trigger the bug and the assembly code of the memchr() function,
copied from EGLIBC.

It would be nice if someone can fix the assembly code so that the
prefetching does not create memory faults. Thanks in advance.

Cheers,
Aurelien

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


test-memchr.tar.gz
Description: Binary data


Re: Help on memchr() EGLIBC assembly code

2009-07-13 Thread Carlos O'Donell
On Mon, Jul 13, 2009 at 1:31 PM, Aurelien Jarnoaurel...@aurel32.net wrote:
 With a lot of patches (E)GLIBC 2.10 builds on alpha, but it fails on the
 testsuite for the memchr() function, which is an optimized assembly code
 on alpha. Unfortunately I don't speak alpha assembly very well, so help
 is needed.

 The problem is that the memchr() function on alpha uses prefetch, which
 can cause a page boundary to be crossed, while the standards (POSIX and
 C99) says it should stop when a match is found.

 I have built a small testcase (see file attached), which contains the
 code to trigger the bug and the assembly code of the memchr() function,
 copied from EGLIBC.

 It would be nice if someone can fix the assembly code so that the
 prefetching does not create memory faults. Thanks in advance.

If you remove:
./sysdeps/alpha/alphaev6/memchr.S
./sysdeps/alpha/memchr.S
and allow the build to fallback on string/memchr.c do the tests pass?

You can always add memchr.S routines back in a later release if you
need an immediate workaround.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Help on memchr() EGLIBC assembly code

2009-07-13 Thread Aurelien Jarno
On Mon, Jul 13, 2009 at 02:24:00PM -0400, Carlos O'Donell wrote:
 On Mon, Jul 13, 2009 at 1:31 PM, Aurelien Jarnoaurel...@aurel32.net wrote:
  With a lot of patches (E)GLIBC 2.10 builds on alpha, but it fails on the
  testsuite for the memchr() function, which is an optimized assembly code
  on alpha. Unfortunately I don't speak alpha assembly very well, so help
  is needed.
 
  The problem is that the memchr() function on alpha uses prefetch, which
  can cause a page boundary to be crossed, while the standards (POSIX and
  C99) says it should stop when a match is found.
 
  I have built a small testcase (see file attached), which contains the
  code to trigger the bug and the assembly code of the memchr() function,
  copied from EGLIBC.
 
  It would be nice if someone can fix the assembly code so that the
  prefetching does not create memory faults. Thanks in advance.
 
 If you remove:
 ./sysdeps/alpha/alphaev6/memchr.S
 ./sysdeps/alpha/memchr.S
 and allow the build to fallback on string/memchr.c do the tests pass?
 
 You can always add memchr.S routines back in a later release if you
 need an immediate workaround.

Yeah, it works, and that's actually my plan if I get no answer. But I
would really like to see this fixed now, as otherwise, it will most
probably stay like that indefinitely.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Help on memchr() EGLIBC assembly code

2009-07-13 Thread Matt Turner
On Mon, Jul 13, 2009 at 6:17 PM, Aurelien Jarnoaurel...@aurel32.net wrote:
 On Mon, Jul 13, 2009 at 02:24:00PM -0400, Carlos O'Donell wrote:
 On Mon, Jul 13, 2009 at 1:31 PM, Aurelien Jarnoaurel...@aurel32.net wrote:
  With a lot of patches (E)GLIBC 2.10 builds on alpha, but it fails on the
  testsuite for the memchr() function, which is an optimized assembly code
  on alpha. Unfortunately I don't speak alpha assembly very well, so help
  is needed.
 
  The problem is that the memchr() function on alpha uses prefetch, which
  can cause a page boundary to be crossed, while the standards (POSIX and
  C99) says it should stop when a match is found.
 
  I have built a small testcase (see file attached), which contains the
  code to trigger the bug and the assembly code of the memchr() function,
  copied from EGLIBC.
 
  It would be nice if someone can fix the assembly code so that the
  prefetching does not create memory faults. Thanks in advance.

 If you remove:
 ./sysdeps/alpha/alphaev6/memchr.S
 ./sysdeps/alpha/memchr.S
 and allow the build to fallback on string/memchr.c do the tests pass?

 You can always add memchr.S routines back in a later release if you
 need an immediate workaround.

 Yeah, it works, and that's actually my plan if I get no answer. But I
 would really like to see this fixed now, as otherwise, it will most
 probably stay like that indefinitely.

 --
 Aurelien Jarno                          GPG: 1024D/F1BCDB73
 aurel...@aurel32.net                 http://www.aurel32.net


I'm copying Richard Henderson and Ivan Kokshayshy on this, as they are
without a doubt the most knowledgeable people about this sort of
thing.

As an aside, please make sure these two bugs are fixed in eglibc (I
don't know if they exist in eglibc, but I do realize that you filed
them against glibc yourself -- so this is just a reminder).

http://sources.redhat.com/bugzilla/show_bug.cgi?id=5350
http://sources.redhat.com/bugzilla/show_bug.cgi?id=5400

Matt


--
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Help on memchr() EGLIBC assembly code

2009-07-13 Thread Aurelien Jarno
On Mon, Jul 13, 2009 at 07:16:16PM -0300, Matt Turner wrote:
 On Mon, Jul 13, 2009 at 6:17 PM, Aurelien Jarnoaurel...@aurel32.net wrote:
  On Mon, Jul 13, 2009 at 02:24:00PM -0400, Carlos O'Donell wrote:
  On Mon, Jul 13, 2009 at 1:31 PM, Aurelien Jarnoaurel...@aurel32.net 
  wrote:
   With a lot of patches (E)GLIBC 2.10 builds on alpha, but it fails on the
   testsuite for the memchr() function, which is an optimized assembly code
   on alpha. Unfortunately I don't speak alpha assembly very well, so help
   is needed.
  
   The problem is that the memchr() function on alpha uses prefetch, which
   can cause a page boundary to be crossed, while the standards (POSIX and
   C99) says it should stop when a match is found.
  
   I have built a small testcase (see file attached), which contains the
   code to trigger the bug and the assembly code of the memchr() function,
   copied from EGLIBC.
  
   It would be nice if someone can fix the assembly code so that the
   prefetching does not create memory faults. Thanks in advance.
 
  If you remove:
  ./sysdeps/alpha/alphaev6/memchr.S
  ./sysdeps/alpha/memchr.S
  and allow the build to fallback on string/memchr.c do the tests pass?
 
  You can always add memchr.S routines back in a later release if you
  need an immediate workaround.
 
  Yeah, it works, and that's actually my plan if I get no answer. But I
  would really like to see this fixed now, as otherwise, it will most
  probably stay like that indefinitely.
 
  --
  Aurelien Jarno                          GPG: 1024D/F1BCDB73
  aurel...@aurel32.net                 http://www.aurel32.net
 
 
 I'm copying Richard Henderson and Ivan Kokshayshy on this, as they are
 without a doubt the most knowledgeable people about this sort of
 thing.

Thanks.

 As an aside, please make sure these two bugs are fixed in eglibc (I
 don't know if they exist in eglibc, but I do realize that you filed
 them against glibc yourself -- so this is just a reminder).
 
 http://sources.redhat.com/bugzilla/show_bug.cgi?id=5350
 http://sources.redhat.com/bugzilla/show_bug.cgi?id=5400
 

I have already put a first set of patches needed for alpha in a git
repository [1], and sent a pull request on the libc-ports mailing list.
I hope it will work better than filling bug reports. If it also fails,
I'll send them to EGLIBC.

[1]
http://git.aurel32.net/?p=glibc-ports.git;a=shortlog;h=refs/heads/alpha-for-upstream

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-alpha-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org