On 12/01/2014 02:31 PM, Shannon Dealy wrote:
> On Mon, 1 Dec 2014, Nikolaus Rath wrote:
> 
> [snip]
>> I think at this point I can probably write you a patch to get the file
>> system functional again, but I'd very much like to find out what's
>> happening here.
>>
>> Would you be able to run fsck with --backend-options no-ssl, and capture
>> the traffic using Wireshark?
> 
> Hi Nikolaus,
> 
> I performed several runs of fsck.s3ql while experimenting with wireshark
> (it has been years since I've used it or tcpdump) to get the settings
> right.  Each time, fsck.s3ql failed in the same manner.  Then when I did
> what was to be the final run/capture, it ran to completion without errors.
> 
> Given the behavior above, the first thing that leaps to mind is possibly
> a race condition.  I would usually expect more inconsistency (such as
> failing at different objects each time) if it was simply uninitialized
> data or corruption, though those may be possibilities too.

That's interesting, but it actually fits my hypothesis. fsck.s3ql is
single-threaded, so it's not a race condition. However, when retrieving
the object list from S3, S3QL has to do several requests because S3
forces to "paginize" the list to at most 10000 entries per request. I
suspect there might be a bug in the pagination handling that causes an
object to be listed twice.

Most likely, this is only triggered when S3 does something
uncommon-but-legal in its responses. But the only way to check this is
to get a dump of the raw server response. So if you could try this again
a few more times (use the --force option to force an fsck), that would
be fantastic.

> I still have the bucket that was copied using the aws command line
> tools, and am in the process of copying that to a new bucket for testing
> so we don't lose the corrupt version, but won't get to testing it
> tonight.  I have not tried to use the original file system since
> fsck.s3ql succeeded and am not entirely sure if it trust it without
> knowing what was wrong with fsck.s3ql

At this point, I am pretty confident that this is a bug related to
object listing. Object listing is only used by fsck.s3ql, so I think
that using the file system normally (aka with mount.s3ql) should not
result in any problems.

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to