On Tuesday 26 October 2010 16:26:39 Denys Vlasenko wrote:
> On Tuesday 26 October 2010 19:36, Rob Landley wrote:
> > > The only piece of software of the 1024 CPU machine which
> > > _has to be_ threaded is the kernel, everything else is easier to
> > > parallelize on a task basis: don't waste time degeloping, say, 1024-CPU
> > > parallelized gzip - instead, run thousands of gzip copies!
> >
> > Which then wind up talking to each other through pipes or fifos or
> > sockets, and you have a scalability bottleneck in a select statement
> > sending the data back. Less so now with the new pipe infrastructure, but
> > still, you have to copy data between process contexts because fiddling
> > with page tables for non-persistent shared memory is _more_ expensive
> > than copying, and it's a cache flush either way...
>
> No, not _those_ tasks. Bigger tasks. When you run gzip, you run it as a
> part of a larger "task to accomplish something". E.g. maybe you are
> processing Mars Reconnaissance Orbiter photos, and archiving step in
> pipeline compresses them.
You mean the way the aboriginal build scripts often take a FORK=1 environment
variable that will, for example, make ./download.sh download and extract each
tarball in parallel?
There are a bunch of ways to achieve parallelism. Threading is one of them.
Threading doesn't work on beowulf clusters. But some programs really don't
break down well to beowulf clusters...
> Then, do not bother creating insanely parallelized gzip (and insanely
> parallelized image analysis software, and insanely parallelized
> database...); instead, process in parallel *insane number of photos* using
> run of the mill, simple, *single-threaded* tools.
If you have a task that breaks down that way, sure.
> This may even be faster, since you do not need to bother with locks,
> cacheline bouncing and such.
If you have a task that breaks down that way. Not all tasks do.
> But more importantly, you have so much less complexity and
> fewer bugs to fix!
If you get to pick which problems you want to solve, you can avoid the messy
ones.
> (If you do not have insanely many photos to process, but just a few,
> in most cases today's CPUs are fast enough to not optimize for this case.)
MPEG video compresison in realtime. Each frame is a delta from the previous
frame, _after_ any changes to the data due to "lossy" compression (which can't
be allowed to accumulate or the image quality goes into the toilet extremely
rapidly and you have to re-keyframe multiple times per second.)
A lot of signal processing issues are like that. You can chop the signal at
each keyframe and distribute across a cluster that way, but if your keyframes
are every 2 seconds then you guarantee that much latency, which sucks for
videoconferencing and other types of live broadcasts.
That's one program domain out of hundreds for which "redefine the problem into
something I feel like solving" turns out to be hard.
> > Threads are a tool, just like object orientation. There are times when
> > it's appropriate and very helpful, and times when shoehorning it onto a
> > problem makes things worse.
>
> Right. Use them only when you must.
No, use them when they're appropriate.
There's never a time when you _must_ use object orientation, and yet the linux
kernel's VFS layer is object oriented, with each individual filesystem being a
subclass of the VFS. (They did it in C rather than in C++ but the principle
is there and they happily admit it if you ask 'em.)
Same with threads, lots of times you can do a horrible non-threaded
implementation or a less horrible threaded implementation. I've seen cases
where a non-threaded implementation of something not only scaled less well
then the threaded one, but was _more_ brittle, hard to follow, and prone to
deadlocks.
> > > Readability. !strcmp() reads as "not strcmp" - ?
> >
> > In the shell, 0 is success and nonzero is failure. In C, 0 is failure
> > and nonzero is success. In strcmp() there's greater than, less than, and
> > zero. Anybody who can't keep all these straight is going to have to be
> > very good at writing test suites.
> >
> > > strcmp() != 0 reads as "not equal" - much closer to what it actually
> > > do.
> >
> > The linux kernel style guys actually edit out those kind of useless
> > appendanges as part of their style checking during code review. So
> > you're adding stuff to make the code look less like the kernel does.
> >
> > Do you similar bloat if (x) to say "if (x != FALSE)"? I see that kind of
> > thing in people who are new to C, but not much in people who've been
> > doing it for a while.
>
> I think that even though we do understand what "if (strcmp(...))" mean,
> it doesn't follow that this is the best style. With a style which
> encourages use of == 0 or != 0 with strcmp, I started making less
> bugs with inverted logic in string compares...
*shrug* It's your baby now. I'm the other way, myself.
> if ([!]x) is definitely ok for bool and pointers. For ints,
> it is sometimes good to write if (x != 0), especially if the code
> aroud that place is complex-ish and it's hard to figure out
> the type of x. Not a hard rule.
*blink* *blink*
You see a significant difference between scalar and pointer types?
It's X bytes of information living on either the stack or the heap, which is a
question of which base register it's offset against. If it's a struct, a
second offset gets applied. If it's an array, an offset is calculated and
applied...
An undimensioned array of pointers to pointers to a function pointer is
basically a const long, except in how it's used. But you could take the long
and typecast it.
A pointer is a scalar type. You can do math on the suckers (admittedly in
base sizeof(*)).
char *fred="123";
printf("%s", 2+fred);
Prints 3. Nothing special about "fred+2" or "&fred[2]"...
> > There are some studies out there that say there's an optimal module size,
> > above which defect density increases because you can't keep all the code
> > in your head at once,
>
> Of cource, no one can keep infinite amount of structure in the head.
>
> Readability simply moves that limit a bit farther, by making every
> individual sub-fragment of code less taxing on the brain.
It's only taxing on the brain if you don't do a lot of C coding. If you do,
it's more taxing to see the more verbose (and less common) ways of phrasing
the exact same thing. Worse, you start glossing over the verbosity and miss
when they change things in the useless bits (sometimes via typo).
Personally I find:
if (strcmp(walrus) == 0);
{
thingy()
}
Harder to spot than:
if (!strcmp(walrus));
thingy();
I've seldom found adding extra characters helps me parse code. But then I
didn't even put lot of spaces in each line until Erik complained. (And that
wasn't because I thought it was better, that was just for consistency.)
> > Even comments need to pull their own weight, there are times when
> > deleting unnecessary comments
>
> Yes. Good comments are an art (and quite distinct from "verbose comments").
>
> > I vaguely wonder if there should be a place we list busybox features that
> > other immplementations don't have. For example, busybox mount never
> > needs to say "-o loop", because you can trivially autodetect when you're
> > being asked to mount a file on a directory. (And directory on directory
> > or file on file are bind mounts, although I don't remember if I made it
> > autodetect that.
>
> IIRC no, "file on file" bind mounts aren't working.
I've used them in aboriginal linux, which is using busybox 1.17.2 defconfig, so
I know if you specify --bind it works. Sounds like I never got around to
autodetecting the "source and target are both files" or "source and target are
both directories" cases. Fairly traightforward to do. Probably I should add
a config entry for all three loopback/bind autodetection cases. That acts as a
bit of documentation that the feature exists, too...
And while we're at it, I should add the magic probes for "tar xvf
thing.tar.gz" and such, since there are easily identifiable headers for gz,
bz2, and uncompressed file formats all in the first couple dozen bytes. (The
lack of any lzma identifying magic is lzma's problem, but we can always fall
back to attempting that one for unknown types...)
But again, these days my open source work gets crammed into the snippets of
time when I'm too tired and unfocused to do much else, which isn't exactly
ideal. :P
Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox