I've got a sporadic problem that I'm seeing using NAND/YAFFS on a
Logic LV SOM using a 1928 block YAFFS filesystem.

I've got the 2.6.32 kernel (L23 Poky from
http://www.omappedia.org/wiki/OMAP_Poky) up and running, and
sporadically in testing I
observe an error where 0xff30 shows up in the data read back from the
file - looks somewhat
similar to: http://www.mail-archive.com/[email protected]/msg23103.html

Testing involves using "dd if=/dev/zero of=/mnt/yaffs/<file> bs=1
seek=30M count=0" to create a 30MB file of
zeros and then copies the file around on the flash, md5sum, syncing,
etc to thrash the cache.

The error I'm seeing is that when I read the file back, its md5sum
does not match that of what a 30MB file of zeros should generate.
To verify, I copy the file from the NAND to a temporary file in RAM,
then md5sum that file and if the md5sum mimsmatches, then I hexdump
the file to see where the data mismatches. This all runs fin in my
test shell script, except after a while (somewhere around 30+GB read
from NAND), I see:

somefile.7: mismatch 666896a98683a364c10aeba0649f119c != 281ed1d5ae50e8419f9b978
aab16de83
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
1107800 ff30 ff30 ff30 ff30 ff30 ff30 ff30 ff30
*
1107a00 0000 0000 0000 0000 0000 0000 0000 0000
*
1e00000

instead of the zeros I'd expect.  Originally I thought the problem was
in the NAND where somehow the driver tried to read a sector of data
before it was ready, but if this was the case, I'd expect an ECC error
from the comparison (using Hardware generated ECC, prefetch and DMA).
This is not the case (I added a printk that triggers if
omap_compare_ecc() returns non-zero).  So if no ECC error is reported
then the data should be valid on NAND.  To test if the data was not
written correctly I unmounted the filesystem and remounted it, but
then the md5sum does match.

This is not the first I've seen of the problem.  I've see it in a
2.6.28-rc8 kernel, and in the 2.6.32 kernel I've tried turning off
DMA, prefetch, and that hastens when the error turns up (and the
number of 0xff30 shorts seein).  I modified my testing to use a unique
pattern intead of zeros and found when the 0xff30 shows up, it repeats
for a number of shorts at the start of a page, then I see the data
that I expected from the page. I've also modified the NAND driver to
use a dev_ready function (as well as statistics to track how long it
waits polling the R/B# line on WAIT0 that indicate its 21.2uS +/-
8.29uS once the call to omap_device_ready is made), and still no joy.

I've also run this code on a 2.6.33-rc3 kernel with the same driver
set and there it works flawlessly.  Unfortunately I need the Poky
kernel...

At this point I'm at a loss to explain what is happening:

1) Has anyone seen this type of error before?

2) Are there any OMAP35x errata that could possibly explain what I'm seeing?

3) Has anyone done exhaustive testing of NAND-based filesystem on an
OMAP35x board?

4) Any suggestions where to look next? (YAFFS testing with nandsim on
an x86 doesn't exhibit the problem).

The following is the original test script (cd into the mountpoint of
the NAND filesystem before running):

#!/bin/bash

# MD5sum of 30M and 1K of zeros
md5_30M=281ed1d5ae50e8419f9b978aab16de83
md5_1K=0f343b0931126a20f133d67c2b018a3b

# temp file to use as intermediary copy
tmpfile=/dev/tmp/junk
#tmpfile=/tmp/junk
mkdir -p `dirname $tmpfile`

mismatches=0
pass=0
passes=120
if [ "$1" != "" ]; then
    passes="$1"
fi


# $1 is file
# $2 is good checksum
    chk_md5sum() {
        for cmf in $1
        do
            cp $cmf $tmpfile
            ret=`md5sum $tmpfile | cut -d" " -f1`
            if [ "$ret" != "$2" ]; then
                echo "$cmf: mismatch $ret != $2"
                hexdump < $tmpfile | head -100
                mismatches=`expr $mismatches + 1`
            else
                echo "$cmf: match $ret"
            fi
        done
    }

# $1 is src
# $2 is destination
# $3 is expected md5sum of source
    chk_cp() {
        cp $1 $tmpfile
        cp $tmpfile $2
        ret=`md5sum $tmpfile | cut -d" " -f1`
        if [ "$ret" != "$3" ]; then
            echo "$1: mismatch $ret != $3"
            hexdump < $tmpfile | head -100
            mismatches=`expr $mismatches + 1`
        fi
    }

while [ $pass -lt $passes ]; do
    pass=`expr $pass + 1`
    echo "Pass: $pass Errors: $mismatches"
    date

# create a 30 M file
    echo "Create 30M file of zeros and get md5sum"
    dd if=/dev/zero of=somefile.1 bs=1 seek=30M count=0

    chk_md5sum somefile.1 $md5_30M

# create copies of file
    for f in 2 3 4 ;
    do
        cp somefile.1 somefile.$f
    done

    echo "Calculate md5sums for copied files"
    chk_md5sum "somefile.*" $md5_30M
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    echo "execute sync and recalculate md5sums"
    sync
    chk_md5sum "somefile.*" $md5_30M
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    echo "Delete one of the files"
    rm somefile.2

    echo "recopy the deleted file"
    cp somefile.1 somefile.7
    chk_md5sum "somefile.*" $md5_30M
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    echo "Creating test folder and some junk files in that folder"
    mkdir -p test

    cd test

    dd if=/dev/random of=junk.1 bs=1 count=0 seek=1k

    chk_md5sum junk.1 $md5_1K
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    for f in 2 3 4 5 6 7 8 9;
    do
        chk_cp junk.1 junk.$f $md5_1K
        if [ "$mismatches" != "0" ]; then
            break;
        fi
    done

    echo "md5sums of all files in test folder"
    chk_md5sum "junk.*" $md5_1K
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    echo "execute sync and recalculate md5sums"
    sync

    chk_md5sum "junk.*" $md5_1K
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    echo "Remove some files and recreate them"
    for f in  3 5 8;
    do
        rm junk.$f
    done

    for f in  8 3 5;
    do
        chk_cp junk.1 junk.$f $md5_1K
        if [ "$mismatches" != "0" ]; then
            break;
        fi
    done

    cd ..

    echo "Calculate md5sums for 30M files again"
    chk_md5sum "somefile.*" $md5_30M
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    echo "execute sync and recalculate md5sums"
    sync
    chk_md5sum "somefile.*" $md5_30M
    if [ "$mismatches" != "0" ]; then
        break;
    fi

    if [ -f /proc/yaffs ]; then
        cat /proc/yaffs
    fi
    if [ -f /proc/nand-wait-stats ]; then
        cat /proc/nand-wait-stats
    fi
done
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to