[ 
https://issues.apache.org/jira/browse/TS-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Bruno updated TS-4242:
---------------------------
    Description: 
I'm simulating a disk failure of 1 sector with the following setup:

{noformat}
dd if=/dev/zero of=err.img bs=512 count=2097152
losetup /dev/loop0 err.img
dmsetup create err0 <<EOF
0 1024000 linear /dev/loop0 0
1024000 1 error
1024001 1073151 linear /dev/loop0 1024001
EOF
dmsetup mknodes err0
{noformat}

With the above command, we create a 1Gib disk, and at 500mib we simulate an 
error for a single 512bytes sector.

storage.config:
{noformat}
/dev/mapper/err0
{noformat}

Now I have a tool that randomly generates urls, stores them, and requests them 
back with a certain probability. So that I both write and read from the disk 
with a certain offered/expected hit ratio.

Once I hit the 500mib mark, trafficserver keeps failing at writing and reading 
every new object. I fear it's because trafficserver keeps writing that bad 
sector, and it does NOT skip that bad sector.

These are the errors/warnings I'm seeing in the log repeatedly:
{noformat}
[Feb 29 15:29:33.308] Server {0x2ac3f1cd4700} WARNING: <AIO.cc:410 (cache_op)> 
cache disk operation failed WRITE -1 5
[Feb 29 15:29:33.309] Server {0x2ac3e56063c0} WARNING: <Cache.cc:2089 
(handle_disk_failure)> Error accessing Disk /dev/mapper/err0 [1726/100000000]
[Feb 29 15:29:33.320] Server {0x2ac3e56063c0} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
75B41B1A2C85AE637DD6CE368BF783D0
[Feb 29 15:29:33.323] Server {0x2ac3eb480700} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
1075CEA6E2E47496BE190DBB448B0B64
...
[Feb 29 15:29:33.284] Server {0x2ac3f28e0700} WARNING: <AIO.cc:410 (cache_op)> 
cache disk operation failed WRITE -1 5
[Feb 29 15:29:33.287] Server {0x2ac3eb682700} WARNING: <Cache.cc:2089 
(handle_disk_failure)> Error accessing Disk /dev/mapper/err0 [1725/100000000]
[Feb 29 15:29:33.289] Server {0x2ac3eb682700} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
7E3325870F5488955118359E6C4B10F4
[Feb 29 15:29:33.289] Server {0x2ac3eb27e700} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
7AE309F21ABF9B3774C67921018FCA0E
...
{noformat}

Summary: trafficserver does not treat I/O errors as permanent, but as 
temporary. Is this true? This leads to either:
1. Replace the hard disk
2. Use a devicemapper to skip the bad sector.

Both cases lead to throwing away a whole disk cache of terabytes for just a bad 
sector.

If this is what's really happening, is it feasible to skip the bad sector? If 
so, I could work on a patch.

  was:
I'm simulating a disk failure of 1 sector with the following setup:

{noformat}
dd if=/dev/zero of=err.img bs=512 count=2097152
losetup /dev/loop0 err.img
dmsetup create err0 <<EOF
0 1024000 linear /dev/loop0 0
1024000 1 error
1024001 1073151 linear /dev/loop0 1024001
EOF
dmsetup mknodes err0
{noformat}

With the above command, we create a 1Gib disk, and at 500mib we simulate an 
error for a 512bytes sector.

storage.config:
{noformat}
/dev/mapper/err0
{noformat}

Now I have a tool that randomly generates urls, stores them, and requests them 
back with a certain probability. So that I both write and read from the disk 
with a certain offered/expected hit ratio.

Once I hit the 500mib mark, trafficserver keeps failing at writing and reading 
every new object. I fear it's because trafficserver keeps writing that bad 
sector, and it does NOT skip that bad sector.

These are the errors/warnings I'm seeing in the log repeatedly:
{noformat}
[Feb 29 15:29:33.308] Server {0x2ac3f1cd4700} WARNING: <AIO.cc:410 (cache_op)> 
cache disk operation failed WRITE -1 5
[Feb 29 15:29:33.309] Server {0x2ac3e56063c0} WARNING: <Cache.cc:2089 
(handle_disk_failure)> Error accessing Disk /dev/mapper/err0 [1726/100000000]
[Feb 29 15:29:33.320] Server {0x2ac3e56063c0} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
75B41B1A2C85AE637DD6CE368BF783D0
[Feb 29 15:29:33.323] Server {0x2ac3eb480700} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
1075CEA6E2E47496BE190DBB448B0B64
...
[Feb 29 15:29:33.284] Server {0x2ac3f28e0700} WARNING: <AIO.cc:410 (cache_op)> 
cache disk operation failed WRITE -1 5
[Feb 29 15:29:33.287] Server {0x2ac3eb682700} WARNING: <Cache.cc:2089 
(handle_disk_failure)> Error accessing Disk /dev/mapper/err0 [1725/100000000]
[Feb 29 15:29:33.289] Server {0x2ac3eb682700} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
7E3325870F5488955118359E6C4B10F4
[Feb 29 15:29:33.289] Server {0x2ac3eb27e700} WARNING: <CacheRead.cc:1011 
(openReadStartHead)> Head : Doc magic does not match for 
7AE309F21ABF9B3774C67921018FCA0E
...
{noformat}

Summary: trafficserver does not treat I/O errors as permanent, but as 
temporary. Is this true? This leads to either:
1. Replace the hard disk
2. Use a devicemapper to skip the bad sector.

Both cases lead to throwing away a whole disk cache of terabytes for just a bad 
sector.

If this is what's really happening, is it feasible to skip the bad sector? If 
so, I could work on a patch.


> Permanent disk failures are not handled gracefully
> --------------------------------------------------
>
>                 Key: TS-4242
>                 URL: https://issues.apache.org/jira/browse/TS-4242
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>            Reporter: Luca Bruno
>
> I'm simulating a disk failure of 1 sector with the following setup:
> {noformat}
> dd if=/dev/zero of=err.img bs=512 count=2097152
> losetup /dev/loop0 err.img
> dmsetup create err0 <<EOF
> 0 1024000 linear /dev/loop0 0
> 1024000 1 error
> 1024001 1073151 linear /dev/loop0 1024001
> EOF
> dmsetup mknodes err0
> {noformat}
> With the above command, we create a 1Gib disk, and at 500mib we simulate an 
> error for a single 512bytes sector.
> storage.config:
> {noformat}
> /dev/mapper/err0
> {noformat}
> Now I have a tool that randomly generates urls, stores them, and requests 
> them back with a certain probability. So that I both write and read from the 
> disk with a certain offered/expected hit ratio.
> Once I hit the 500mib mark, trafficserver keeps failing at writing and 
> reading every new object. I fear it's because trafficserver keeps writing 
> that bad sector, and it does NOT skip that bad sector.
> These are the errors/warnings I'm seeing in the log repeatedly:
> {noformat}
> [Feb 29 15:29:33.308] Server {0x2ac3f1cd4700} WARNING: <AIO.cc:410 
> (cache_op)> cache disk operation failed WRITE -1 5
> [Feb 29 15:29:33.309] Server {0x2ac3e56063c0} WARNING: <Cache.cc:2089 
> (handle_disk_failure)> Error accessing Disk /dev/mapper/err0 [1726/100000000]
> [Feb 29 15:29:33.320] Server {0x2ac3e56063c0} WARNING: <CacheRead.cc:1011 
> (openReadStartHead)> Head : Doc magic does not match for 
> 75B41B1A2C85AE637DD6CE368BF783D0
> [Feb 29 15:29:33.323] Server {0x2ac3eb480700} WARNING: <CacheRead.cc:1011 
> (openReadStartHead)> Head : Doc magic does not match for 
> 1075CEA6E2E47496BE190DBB448B0B64
> ...
> [Feb 29 15:29:33.284] Server {0x2ac3f28e0700} WARNING: <AIO.cc:410 
> (cache_op)> cache disk operation failed WRITE -1 5
> [Feb 29 15:29:33.287] Server {0x2ac3eb682700} WARNING: <Cache.cc:2089 
> (handle_disk_failure)> Error accessing Disk /dev/mapper/err0 [1725/100000000]
> [Feb 29 15:29:33.289] Server {0x2ac3eb682700} WARNING: <CacheRead.cc:1011 
> (openReadStartHead)> Head : Doc magic does not match for 
> 7E3325870F5488955118359E6C4B10F4
> [Feb 29 15:29:33.289] Server {0x2ac3eb27e700} WARNING: <CacheRead.cc:1011 
> (openReadStartHead)> Head : Doc magic does not match for 
> 7AE309F21ABF9B3774C67921018FCA0E
> ...
> {noformat}
> Summary: trafficserver does not treat I/O errors as permanent, but as 
> temporary. Is this true? This leads to either:
> 1. Replace the hard disk
> 2. Use a devicemapper to skip the bad sector.
> Both cases lead to throwing away a whole disk cache of terabytes for just a 
> bad sector.
> If this is what's really happening, is it feasible to skip the bad sector? If 
> so, I could work on a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to