Re: [Toybox] dd tests for transaction size?

2017-07-11 Thread Rob Landley


On 07/11/2017 06:37 AM, Samuel Holland wrote:
> On 07/09/17 18:25, Rob Landley wrote:
>> On 07/09/2017 05:39 PM, Samuel Holland wrote:
>>> On 07/09/17 17:07, Rob Landley wrote:
 Has anybody actually used the conv=sync option in the past 20 years?
 It doesn't do what you think (that's conv=fsync), instead it pads
 short reads with zeroes so the input block size is always
 the same.

 How is that useful?
>>>
>>> It's very useful when trying to image dying hard disks, so bad
>>> sectors (that cause short reads or read failure) do not affect the
>>> alignment of the rest of the data in the image file.
>>
>> Does the read failure automatically advance the file pointer by the
>> full amount, or does it have to lseek after the read? (I'm assuming
>> the first, but that's an implementation detail I have so many
>> questions about.
> 
> I don't know the implementation details.

I kinda need to know the implementation details. :)

That said, I'm assuming I don't have to do anything special because dd
wasn't written initially to care about this behavior, and that's just
what happened when it did the simple thing, and your trick was adapted
to what it was already doing, not the other way around. :)

>> I'm assuming your read block size is the device physical block size,
>> and thus the short read isn't skipping data after the missing data
>> which you otherwise could have read but gets turned into a short read
>> because the read would otherwise have holes in it?
> 
> Everything within ibs past the point of failure is replaced with zeroes.
> So if the ibs=16k and your third 512b sector is bad, you'll have 1k of
> good data followed by 15k of zeroes, even if the rest of the sectors are
> okay. If you want to maximize the amount of data retrieved, then yes,
> you want ibs=sectorsize. If you want to quickly find good/bad areas of
> the disk, then ibs can be larger.

Ok, so you're micromanaging your call to it, and again dd's
implementation doesn't have to be clever.

>> How do you distinguish older 512 byte blocks from modern 4k blocks
>> from whatever the heck flash block sizes are? I assume there's an
>> ioctl() or something to find this out? Or do you just always use 512
>> bytes and hammering away at different parts of the same bad block
>> eventually makes its way out the far side?)
> 
> I take the disk out of the machine and look at the label :) If it has
> the "AF" logo, it's 4k sectors, otherwise it's 512b. There's also an
> ioctl: `blockdev --getbsz /dev/sda`. In my experience, flash storage
> doesn't get bad sectors; it works until it doesn't.

Ooh, we even have a blocksize in toybox already. (People send me these
things, I scrutinize them to make sure they work, but I don't always
remember "shred" or "acpi" exists until somebody brings it up. :)

(The acpi command's -V option seems to think my netbook's lcd has a
cooling fan. This is an option?)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-11 Thread enh
On Tue, Jul 11, 2017 at 12:10 PM, Rob Landley  wrote:
> On 07/10/2017 02:00 PM, enh wrote:
>> On Mon, Jul 10, 2017 at 10:57 AM, enh  wrote:
 Would the above bs= behavior tie sound like it would work for the users
 you found?
>>>
>>> for what the comments claim, yes, but i'll track one of the clueful
>>> users down and check with them...
>>
>> actually, only one user seems to be setting bs=. everyone else has ibs=.
>
> Any idea why?

no. probably because it was the first thing that worked?

> Right now bs= is already behaving specially for the last block (posix!)
> so adding another aspect to the existing special case makes sense. But
> if I make that change, setting ibs= instead of bs= is what you'd do to
> _preserve_ the posix behavior. :)
>
> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-11 Thread Rob Landley
On 07/10/2017 02:00 PM, enh wrote:
> On Mon, Jul 10, 2017 at 10:57 AM, enh  wrote:
>>> Would the above bs= behavior tie sound like it would work for the users
>>> you found?
>>
>> for what the comments claim, yes, but i'll track one of the clueful
>> users down and check with them...
> 
> actually, only one user seems to be setting bs=. everyone else has ibs=.

Any idea why?

Right now bs= is already behaving specially for the last block (posix!)
so adding another aspect to the existing special case makes sense. But
if I make that change, setting ibs= instead of bs= is what you'd do to
_preserve_ the posix behavior. :)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-10 Thread Robert Thompson
It's pretty heavily used in combination with noerror. I can personally
attest to its usefulness when working with both damaged optical media and
spinning rust (with the correct blocksize in each case).

Each block read either contains blocksize data bytes or blocksize null
bytes, so that the source and destination still have matching offsets.

Incidentally, I've seen this used to progressively recover all readable
sectors on a damaged hard disk. Use a large
multiple-of-hardware-sector-size blocksize to start. The destination file
will retain (large) holes wherever there are bad blocks. searching for
aligned blocks of nulls will give you lists of offsets to seek and skip to
to retry with the smaller blocksize.

>From what I've seen, if a blocksized read overlaps with a bad sector, once
the kernel gives up on the sector, it fails the read without attempting any
further sectors. Since (before the days of widespread block-remapping)
sector failures often come in clusters, and avoiding reading bad sectors
can improve the odds of getting all the not-yet-bad sectors, you can play
iterative blocksize games to delay repeated risky reads until after most of
the good data is recovered.

I *have* seen this used with non-blockdevices, although at the moment I
can't recall the details (it wasn't recent).

Oddly enough, sync does what i expect. It's fsync and fdatasync that I was
surprised by. Guess I'm showing my age ;)




On Sun, Jul 9, 2017 at 5:39 PM, Samuel Holland  wrote:

> On 07/09/17 17:07, Rob Landley wrote:
>
>> Has anybody actually used the conv=sync option in the past 20 years? It
>> doesn't do what you think (that's conv=fsync), instead it pads short reads
>> with zeroes so the input block size is always the same.
>>
>> How is that useful?
>>
>
> It's very useful when trying to image dying hard disks, so bad sectors
> (that cause short reads or read failure) do not affect the alignment of
> the rest of the data in the image file.
>
> Rob
>>
>
> Samuel
>
> ___
> Toybox mailing list
> Toybox@lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-10 Thread enh
On Mon, Jul 10, 2017 at 10:57 AM, enh  wrote:
> On Mon, Jul 10, 2017 at 10:46 AM, Rob Landley  wrote:
>> On 07/10/2017 12:15 AM, enh wrote:
>>> On Sun, Jul 9, 2017 at 4:23 PM, Rob Landley  wrote:
> It's used in various boot disk generation scripts in the Android tree.
> (Whether it's needed is a question I can't answer as easily!)

 Hang on, do you use sync= or fsync=?
>>>
>>> the only code owned my either of _my_ teams uses conv=fsync. but a
>>> search for conv=sync turned up _more_ references to conv=sync than
>>> conv=fsync.
>>
>> Is there a Gratuitous Gnu Extension for this...?
>>
>> Hmmm, conv=sparse seems useful.
>>
>> How is conv=excl _not_ oflag=excl? In what world did that make sense?
>> (Alright, nocreat and notrunc were already there. I guess that one's on
>> posix.)
>>
>> WHY does ubuntu's dd parse zeta or yottabyte extensions? A signed 64 bit
>> number is 4 exabytes, do they expect 128 bit math here? (Do they expect
>> _filesystems_ that can field that much data? Most of the distributed
>> filesystems work on top of fuse which enforces a _signed_ 64 bit off_t,
>> again 4 exabytes...)
>>
>> Nope, there isn't even a gnu extension for this. Lovely.
>>
 Because sync= pads _all_ short reads to block size with zeroes, not just
 the last one. I can see this being used to pad the _last_ block with
 zeroes, but it sounds like it could also corrupt the data if there are
 short reads before that.
>>>
>>> the users that had comments said they wanted the last block padded.
>>> padding all short reads (including EINTR ones) seems insane.
>>
>> I thought so, yes. Alas, implementing this thing to only pad the _last_
>> block seems...
>>
>> Hmmm. Maybe I could tie this to the already fiddly posix bs= behavior.
>> If you specify bs= instead of ifs= and ofs=, then it does the normal
>> "write out what we got" and only pads the last block. If you specify
>> ifs= then it pads each input block.
>>
>> I'd poke the posix committee about this but after
>> http://article.gmane.org/gmane.comp.standards.posix.austin.general/12203
>> I've stopped expecting that to accomplish anything.
>>
 Sigh. I suppose android builds are always happening from local disk (not
 network filesystems) and are never suspended during a build, so you
 don't need to worry about it?
>>>
>>> i suspect that's why no-one notices the insanity. (when we first
>>> started using the gold linker elsewhere in google i had to fix its
>>> EINTR behavior. happens all the time on a FUSE file system, but folks
>>> had been using it for years on regular file systems without noticing.)
>>
>> Yeah, I have readall() and writeall() in lib to loop for me. (Back in
>> busybox I had code checking for EAGAIN but the kernel guys convinced me
>> we shouldn't get _zero_ length reads except at EOF now? Alas googling
>> for this discussion is not finding it, and it's once again one of those
>> "but did I test on _v9fs_?" sort of things trying to prove a negative
>> myself.)
>>
>> Would the above bs= behavior tie sound like it would work for the users
>> you found?
>
> for what the comments claim, yes, but i'll track one of the clueful
> users down and check with them...

actually, only one user seems to be setting bs=. everyone else has ibs=.

>> Rob
>
>
>
> --
> Elliott Hughes - http://who/enh - http://jessies.org/~enh/
> Android native code/tools questions? Mail me/drop by/add me as a reviewer.



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-10 Thread enh
On Mon, Jul 10, 2017 at 10:46 AM, Rob Landley  wrote:
> On 07/10/2017 12:15 AM, enh wrote:
>> On Sun, Jul 9, 2017 at 4:23 PM, Rob Landley  wrote:
 It's used in various boot disk generation scripts in the Android tree.
 (Whether it's needed is a question I can't answer as easily!)
>>>
>>> Hang on, do you use sync= or fsync=?
>>
>> the only code owned my either of _my_ teams uses conv=fsync. but a
>> search for conv=sync turned up _more_ references to conv=sync than
>> conv=fsync.
>
> Is there a Gratuitous Gnu Extension for this...?
>
> Hmmm, conv=sparse seems useful.
>
> How is conv=excl _not_ oflag=excl? In what world did that make sense?
> (Alright, nocreat and notrunc were already there. I guess that one's on
> posix.)
>
> WHY does ubuntu's dd parse zeta or yottabyte extensions? A signed 64 bit
> number is 4 exabytes, do they expect 128 bit math here? (Do they expect
> _filesystems_ that can field that much data? Most of the distributed
> filesystems work on top of fuse which enforces a _signed_ 64 bit off_t,
> again 4 exabytes...)
>
> Nope, there isn't even a gnu extension for this. Lovely.
>
>>> Because sync= pads _all_ short reads to block size with zeroes, not just
>>> the last one. I can see this being used to pad the _last_ block with
>>> zeroes, but it sounds like it could also corrupt the data if there are
>>> short reads before that.
>>
>> the users that had comments said they wanted the last block padded.
>> padding all short reads (including EINTR ones) seems insane.
>
> I thought so, yes. Alas, implementing this thing to only pad the _last_
> block seems...
>
> Hmmm. Maybe I could tie this to the already fiddly posix bs= behavior.
> If you specify bs= instead of ifs= and ofs=, then it does the normal
> "write out what we got" and only pads the last block. If you specify
> ifs= then it pads each input block.
>
> I'd poke the posix committee about this but after
> http://article.gmane.org/gmane.comp.standards.posix.austin.general/12203
> I've stopped expecting that to accomplish anything.
>
>>> Sigh. I suppose android builds are always happening from local disk (not
>>> network filesystems) and are never suspended during a build, so you
>>> don't need to worry about it?
>>
>> i suspect that's why no-one notices the insanity. (when we first
>> started using the gold linker elsewhere in google i had to fix its
>> EINTR behavior. happens all the time on a FUSE file system, but folks
>> had been using it for years on regular file systems without noticing.)
>
> Yeah, I have readall() and writeall() in lib to loop for me. (Back in
> busybox I had code checking for EAGAIN but the kernel guys convinced me
> we shouldn't get _zero_ length reads except at EOF now? Alas googling
> for this discussion is not finding it, and it's once again one of those
> "but did I test on _v9fs_?" sort of things trying to prove a negative
> myself.)
>
> Would the above bs= behavior tie sound like it would work for the users
> you found?

for what the comments claim, yes, but i'll track one of the clueful
users down and check with them...

> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-10 Thread Rob Landley
On 07/10/2017 12:15 AM, enh wrote:
> On Sun, Jul 9, 2017 at 4:23 PM, Rob Landley  wrote:
>>> It's used in various boot disk generation scripts in the Android tree.
>>> (Whether it's needed is a question I can't answer as easily!)
>>
>> Hang on, do you use sync= or fsync=?
> 
> the only code owned my either of _my_ teams uses conv=fsync. but a
> search for conv=sync turned up _more_ references to conv=sync than
> conv=fsync.

Is there a Gratuitous Gnu Extension for this...?

Hmmm, conv=sparse seems useful.

How is conv=excl _not_ oflag=excl? In what world did that make sense?
(Alright, nocreat and notrunc were already there. I guess that one's on
posix.)

WHY does ubuntu's dd parse zeta or yottabyte extensions? A signed 64 bit
number is 4 exabytes, do they expect 128 bit math here? (Do they expect
_filesystems_ that can field that much data? Most of the distributed
filesystems work on top of fuse which enforces a _signed_ 64 bit off_t,
again 4 exabytes...)

Nope, there isn't even a gnu extension for this. Lovely.

>> Because sync= pads _all_ short reads to block size with zeroes, not just
>> the last one. I can see this being used to pad the _last_ block with
>> zeroes, but it sounds like it could also corrupt the data if there are
>> short reads before that.
> 
> the users that had comments said they wanted the last block padded.
> padding all short reads (including EINTR ones) seems insane.

I thought so, yes. Alas, implementing this thing to only pad the _last_
block seems...

Hmmm. Maybe I could tie this to the already fiddly posix bs= behavior.
If you specify bs= instead of ifs= and ofs=, then it does the normal
"write out what we got" and only pads the last block. If you specify
ifs= then it pads each input block.

I'd poke the posix committee about this but after
http://article.gmane.org/gmane.comp.standards.posix.austin.general/12203
I've stopped expecting that to accomplish anything.

>> Sigh. I suppose android builds are always happening from local disk (not
>> network filesystems) and are never suspended during a build, so you
>> don't need to worry about it?
> 
> i suspect that's why no-one notices the insanity. (when we first
> started using the gold linker elsewhere in google i had to fix its
> EINTR behavior. happens all the time on a FUSE file system, but folks
> had been using it for years on regular file systems without noticing.)

Yeah, I have readall() and writeall() in lib to loop for me. (Back in
busybox I had code checking for EAGAIN but the kernel guys convinced me
we shouldn't get _zero_ length reads except at EOF now? Alas googling
for this discussion is not finding it, and it's once again one of those
"but did I test on _v9fs_?" sort of things trying to prove a negative
myself.)

Would the above bs= behavior tie sound like it would work for the users
you found?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-09 Thread enh
On Sun, Jul 9, 2017 at 4:23 PM, Rob Landley  wrote:
>
> On 07/09/2017 05:18 PM, enh wrote:
>> > On 07/09/2017 02:41 AM, Rob Landley wrote:
>> > > does anybody have a decent strategy for testing the ibs
>> > > and obs options?
>> >
>> > Has anybody actually used the conv=sync option in the past 20 years? It
>> > doesn't do what you think (that's conv=fsync), instead it pads short
>> > reads with zeroes so the input block size is always the same.
>> >
>> > How is that useful?
>>
>> It's used in various boot disk generation scripts in the Android tree.
>> (Whether it's needed is a question I can't answer as easily!)
>
> Hang on, do you use sync= or fsync=?

the only code owned my either of _my_ teams uses conv=fsync. but a
search for conv=sync turned up _more_ references to conv=sync than
conv=fsync.

> Because sync= pads _all_ short reads to block size with zeroes, not just
> the last one. I can see this being used to pad the _last_ block with
> zeroes, but it sounds like it could also corrupt the data if there are
> short reads before that.

the users that had comments said they wanted the last block padded.
padding all short reads (including EINTR ones) seems insane.

> I say this because SIGSTOP/SIGCONT used to cause short reads, and the
> interrupted read returned the data it had gotten when you continued.
> (Yes, I had builds break because I suspended and resumed them, and root
> caused it.)
>
> The really annoying part was it would cause _zero_ length reads which
> didn't mean end of file, you had to check errno for EAGAIN to
> distinguish that. I think the kernel guys fixed it so it won't do that,
> but if you SIGSTOP something that's doing a 1G single read() and then
> SIGCONT what happens these days?
>
> Hmmm...
>
>   $ cat > richards.c << EOF
>   #include 
>   #include 
>
>   int main(int argc, char *argv[])
>   {
> printf("len=%d\n", read(0, malloc(10), 10));
>   }
>   EOF
>   $ gcc richards.c
>   $ ./a.out < coderush.mp4
>   ^Z
>   [1]+  Stopped
>   $ fg
>   len=318818698
>
> Oh that's just spectacular. The SIGSTOP doesn't take effect until the
> read system call returns, even if it's reading 300 megs from rotating
> media. Thank you SO much linux guys. (Having done a couple runs with
> drop_caches I can confirm that ctrl-C doesn't work until the syscall
> returns either. That's just beautiful. Obviously that's never going to
> cause a problem ever, driving a system deep into swap thrashing... Reads
> from local disk block in D state now?)
>
>   $ ./a.out < /dev/zero
>   ^Z
>   [1]+  Stopped
>   landley@driftwood:~$ fg
>   ./a.out < /dev/zero
>   len=96399360
>
> Right, at least THAT still behaves like I expected. So SOMETIMES reads
> will be shortened by suspend/resume cycles. And other times they block
> in D state. (Yay inconsistent behavior!)
>
> Sigh. I suppose android builds are always happening from local disk (not
> network filesystems) and are never suspended during a build, so you
> don't need to worry about it?

i suspect that's why no-one notices the insanity. (when we first
started using the gold linker elsewhere in google i had to fix its
EINTR behavior. happens all the time on a FUSE file system, but folks
had been using it for years on regular file systems without noticing.)

> Rob



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] dd tests for transaction size?

2017-07-09 Thread Rob Landley
On 07/09/2017 02:41 AM, Rob Landley wrote:
> does anybody have a decent strategy for testing the ibs
> and obs options?

Has anybody actually used the conv=sync option in the past 20 years? It
doesn't do what you think (that's conv=fsync), instead it pads short
reads with zeroes so the input block size is always the same.

How is that useful?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net