Re: [maemo-developers] RE: defective memory?

2006-09-21 Thread Kimmo Hämäläinen
On Wed, 2006-09-20 at 23:17, ext Frantisek Dufka wrote:
...
 So it really seems related to wi-fi.

Thank you for the information, I'm trying to keep the internal
investigation on-going (and not just assuming that HW is broken). These
kind of hints should help to nail it down.

BR, Kimmo

 
 Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] RE: defective memory?

2006-09-21 Thread Olivier ROLAND




Kimmo Hmlinen a crit:

  On Wed, 2006-09-20 at 23:17, ext Frantisek Dufka wrote:
...
  
  
So it really seems related to wi-fi.

  
  
Thank you for the information, I'm trying to keep the internal
investigation on-going (and not just assuming that HW is broken). These
kind of hints should help to nail it down.

BR, Kimmo

  
  
Frantisek

  

Hum ... this time I spent much more time on this and i have done a lot
of tests.
I try to use a scientifical approach so that I could say "yes the
problem happen 5/100 if condition x ..."
The results "for my device" are that the problem seem not to be related
to
- empty battery 
- wifi
- temperature
- high power requirements
- bad memory region
or any combination of these parameters.

You can have the illusion that there is a correlation with that or that
but no, the stats say that it's just a coincidence.

I have notice that when memtester find a bad address line then the
probability to find other bad address in the same run is very high.

So after all that I REALLY don't know what the problem is  :-( 
Need somme new ideas ... or an oscilloscope ... to go deeper.

Regards,
Olivier ROLAND


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] RE: defective memory?

2006-09-20 Thread Frantisek Dufka

Olivier ROLAND wrote:

Siarhei Siamashka a écrit :

On 9/19/06, Kimmo Hämäläinen [EMAIL PROTECTED] wrote:


Yes, it would need to be reproducible in several different devices. The
guy here that tried to reproduce it currently thinks that Siarhei's unit
is broken.



If your device is broken then mine is also.


And mine too.


Nokia770-26:~# ./memtester 40 1
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 40MB (41943040 bytes)
got  40MB (41943040 bytes), virtual address=0x41131000, trying mlock 
...locked.

Loop 1/1:
  Stuck Address   : testing   0FAILURE: possible bad address line 
at offset 0x0029b1a5 (page offset 1a5).

Skipping to next test...
  Random Value: FAILURE: 0x5df9 != 0x5df900cb at offset 
0x0029b1a5 (page offset 1a5).

FAILURE: 0xa783 != 0xa783e258 at offset 0x0029b1a5 (page offset 1a5).
  Compare XOR : FAILURE: 0xf086b165 != 0xf08793bd at offset 
0x0029b1a5 (page offset 1a5).
  Compare SUB : FAILURE: 0x333c != 0xfd3b5e62 at offset 
0x0029b1a5 (page offset 1a5).

  Compare MUL :   Compare DIV : ok
FAILURE: 0x7feb != 0x7febf0e8 at offset 0x0029b1a5 (page offset 1a5).
  Compare OR  : FAILURE: 0x7b69 != 0x7b69b068 at offset 
0x0029b1a5 (page offset 1a5).
  Compare AND : FAILURE: 0xfdcc != 0xfdccec72 at offset 
0x0029b1a5 (page offset 1a5).
  Sequential Increment:   Solid Bits  : testing   1FAILURE: 
0x != 0x at offset 0x0029b1a5 (page offset 1a5).
  Block Sequential: testing   1FAILURE: 0x0101 != 0x01010101 at 
offset 0x0029b1a5 (page offset 1a5).
  Checkerboard: testing   0FAILURE: 0x != 0x at 
offset 0x0029b1a5 (page offset 1a5).
  Bit Spread  : testing   0FAILURE: 0x != 0xfffa at 
offset 0x0029b1a5 (page offset 1a5).
  Bit Flip: testing   0FAILURE: 0x != 0x0001 at 
offset 0x0029b1a5 (page offset 1a5).
  Walking Ones: testing   0FAILURE: 0x != 0xfffe at 
offset 0x0029b1a5 (page offset 1a5).

  Walking Zeroes  : testing   0Killed
Nokia770-26:~# Connection to n770 closed by remote host.
Connection to n770 closed.

This was done via ssh over wi-fi when the battery icon was already red. 
Few tenths of seconds later device powered down due to empty battery.


Also did it ~30 minutes before when the battery meter was still grey 
over bluetooth PAN and the test went fine.


Looks like combination with wi-fi.

Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] RE: defective memory?

2006-09-20 Thread Frantisek Dufka
After previous test I put device with empty battery to charger and it 
happens also when connected to charger (over wi-fi).


Once with wlan power settings 100mw and later also when reduced to 10mw.

Nokia770-26:~# ./memtester 40 1
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 40MB (41943040 bytes)
got  40MB (41943040 bytes), virtual address=0x41131000, trying mlock 
...locked.

Loop 1/1:
  Stuck Address   : testing   0FAILURE: possible bad address line 
at offset 0x005031a5 (page offset 1a5).

Skipping to next test...
  Random Value: FAILURE: 0xbf3777dc != 0xbf37 at offset 
0x31a5 (page offset 1a5).

FAILURE: 0x454d954f != 0x454d at offset 0x31a5 (page offset 1a5).
  Compare XOR : FAILURE: 0x8e5146b4 != 0x8e50b165 at offset 
0x31a5 (page offset 1a5).
  Compare SUB : FAILURE: 0x06bfe708 != 0x9f20 at offset 
0x31a5 (page offset 1a5).

  Compare MUL :   Compare DIV : ok
  Compare OR  : ok
FAILURE: 0x7b69b068 != 0x7b69 at offset 0x31a5 (page offset 1a5).
  Compare AND :   Sequential Increment: ok
  Solid Bits  : testing   1FAILURE: 0x != 0x at 
offset 0x31a5 (page offset 1a5).
  Block Sequential: testing   1FAILURE: 0x01010101 != 0x0101 at 
offset 0x31a5 (page offset 1a5).
  Checkerboard: testing   0FAILURE: 0x != 0x at 
offset 0x31a5 (page offset 1a5).
  Bit Spread  : testing   0FAILURE: 0xfffa != 0x at 
offset 0x31a5 (page offset 1a5).
  Bit Flip: testing   0FAILURE: 0x0001 != 0x at 
offset 0x31a5 (page offset 1a5).
  Walking Ones: testing   0FAILURE: 0xfffe != 0x at 
offset 0x31a5 (page offset 1a5).
  Walking Zeroes  : testing   0FAILURE: 0x0001 != 0x at 
offset 0x31a5 (page offset 1a5).


Done.
Nokia770-26:~#


Then I closed wlan and connected via bluetooth and no errors on charger. 
Then disconnected charger and no error again on battery (grey icon with 
1 stripe again).


Then did again still ins same shell via ssh over bluetooth but also 
connected to WLAN (but left it idle). And it happened again !


Then just disconnected WLAN and run again is same shell and it was OK again.

So it really seems related to wi-fi.

Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


[maemo-developers] RE: defective memory?

2006-09-19 Thread Siarhei Siamashka

On 9/19/06, Kimmo Hämäläinen [EMAIL PROTECTED] wrote:


Yes, it would need to be reproducible in several different devices. The
guy here that tried to reproduce it currently thinks that Siarhei's unit
is broken.


Yes, I also think that the probability of my device being broken is
quite high. A certain (small) fraction of other Nokia 770 owners are
probably having the same problem. Does it make the device completely
useless? Of course no, my device works almost fine, it only crashes
and reboots sometimes, I also has filesystem corruption several times
(now even switched mmc filesystem to ext3, don't know if it would help
much though). So the device can be surely used as a book reader,
internet browser and serve other tasks. Other (small) fraction of
users who got 'white screen of death' were surely less lucky.

What can be done about this if the defective memory problem gets
confirmed. I see three possible ways:
1. 'Ignorance is a bliss' - just do nothing, those who don't know
about the problem will not worry about it :) The device will just
crash or reboot occasionally, some more unlucky users having more
annoying crashes will complain in the forums providing some bad PR.
2. Distribute some diagnostics software that will help to identify
memory problems and repair/replace defective units, that will have
some expences, but will improve overall reliability and reduce the
number of negative publicity.
3. Add some (un)official support for working around bad memory regions
using technology something similar to BadRAM, in this case most of
such units will be completely usable.

In general, bad memory problem is quite common for x86 pc's, but there
is an excellent tool for memory diagnostics - memtest86. It helped me
quite a number of times, also I always advice everyone having
stability issues to run it first. I don't know how the reliability of
memory chips used in embedded devices compares to the reliability of
memory from normal desktop computers, but bad memory seems to be one
of the most frequently encountered hardware problems.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] RE: defective memory?

2006-09-19 Thread Olivier ROLAND

Siarhei Siamashka a écrit :

On 9/19/06, Kimmo Hämäläinen [EMAIL PROTECTED] wrote:


Yes, it would need to be reproducible in several different devices. The
guy here that tried to reproduce it currently thinks that Siarhei's unit
is broken.


Yes, I also think that the probability of my device being broken is
quite high. A certain (small) fraction of other Nokia 770 owners are
probably having the same problem. Does it make the device completely
useless? Of course no, my device works almost fine, it only crashes
and reboots sometimes, I also has filesystem corruption several times
(now even switched mmc filesystem to ext3, don't know if it would help
much though). So the device can be surely used as a book reader,
internet browser and serve other tasks. Other (small) fraction of
users who got 'white screen of death' were surely less lucky.

What can be done about this if the defective memory problem gets
confirmed. I see three possible ways:
1. 'Ignorance is a bliss' - just do nothing, those who don't know
about the problem will not worry about it :) The device will just
crash or reboot occasionally, some more unlucky users having more
annoying crashes will complain in the forums providing some bad PR.
2. Distribute some diagnostics software that will help to identify
memory problems and repair/replace defective units, that will have
some expences, but will improve overall reliability and reduce the
number of negative publicity.
3. Add some (un)official support for working around bad memory regions
using technology something similar to BadRAM, in this case most of
such units will be completely usable.

In general, bad memory problem is quite common for x86 pc's, but there
is an excellent tool for memory diagnostics - memtest86. It helped me
quite a number of times, also I always advice everyone having
stability issues to run it first. I don't know how the reliability of
memory chips used in embedded devices compares to the reliability of
memory from normal desktop computers, but bad memory seems to be one
of the most frequently encountered hardware problems.

If your device is broken then mine is also.
I don't think at all that we speak about (small) fraction because 
majority of users won't even notice the problem.
My device seem stable until I stressed it. And stressed it is not a 
condition suffisante to make the problem happen.
When I have time, I will make extensive test on my device to check 
exactly when the problem occur.


My doubt about small fraction are probably driven by the fact that I 
was hit by 'white screen of death' 4 weeks after buying the device.
So I guess that during the reparation my 770 was checked (again) by the 
conventional Nokia diagnostic.
I conclude that the conventional Nokia diagnostic doesn't detect the 
problem.


To make things clear, I don't want to make negative publicity at all. I 
enjoy this device a lot and I've ported Streamtuner on it with lot of 
great feedback from users.


My 2 cents.

PS: I don't know what is the conventional Nokia diagnostic but as far 
as I know there is always a conventional XXX diagnostic in reparation 
centers.


Olivier ROLAND
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] RE: defective memory?

2006-09-19 Thread Siarhei Siamashka
On Wednesday 20 September 2006 01:12, Olivier ROLAND wrote:

 If your device is broken then mine is also.
 I don't think at all that we speak about (small) fraction because
 majority of users won't even notice the problem.
 My device seem stable until I stressed it. And stressed it is not a
 condition suffisante to make the problem happen.

That's exactly the point. The device is quite usable and most users will not
detect any difference on most common operations. It is a very good sign as 
looks like in order to get rock solid stability, we only need to allocate and
lock the problematic memory page early at boot time and do not let any 
applications use it.

 When I have time, I will make extensive test on my device to check
 exactly when the problem occur.

Please do it, now with the lastest version of the tester and 40MB tested
block, the coverage is almost 2/3 of physical memory. If that's a certain
location in memory, the chances that it can be easily detected are quite 
high. Please verify that the offset of faulty address within 1KB page is
reported to be always the same between different runs (it is equal to 1a5 
for me).

I'm trying to find a way to get a full physical address of that page. In my
last tests I managed to mmap '/dev/mem' (just using 'read' function
segfaults), but did not have enough time to experiment with it much yet.

 My doubt about small fraction are probably driven by the fact that I
 was hit by 'white screen of death' 4 weeks after buying the device.
 So I guess that during the reparation my 770 was checked (again) by the
 conventional Nokia diagnostic.
 I conclude that the conventional Nokia diagnostic doesn't detect the
 problem.

 To make things clear, I don't want to make negative publicity at all. I
 enjoy this device a lot and I've ported Streamtuner on it with lot of
 great feedback from users.

 My 2 cents.

I don't want to make negative publicity either.  My only goal now is to find
some reliable technical solution for both diagnostics and workaround of such
problems. After all, I have a good motivation for that :)

I'm grateful to Nokia as they are also trying to investigate the problem. I'm 
quite confident that we can come up with some solution, and it will have 
some positive effect for Nokia 770 community as a result. This is a new
device, software and tools for it are still being developed. We are all
learning and getting more experience.

 PS: I don't know what is the conventional Nokia diagnostic but as far
 as I know there is always a conventional XXX diagnostic in reparation
 centers.

By the way, when looking for additional information I found some Sharp
Zaurus community forum and asked what they use for hardware diagnostics 
in the hope that I could use the same tools. Somebody replied me that 
hardware diagnostics tools are built in Zaurus firmware and are accessible
from boot menu.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers