Linux-Development-Sys Digest #851

Digestifier Sun, 20 Jun 1999 17:25:43 -0700
Linux-Development-Sys Digest #851, Volume #6     Sun, 20 Jun 99 20:14:16 EDT

Contents:
  Re: Nagel algorithm?? (Joe Doupnik)
  Mainframes, Filesystems, Databases... Re: TAO: the ultimate OS (Christopher B. 
Browne)
  Re: TAO: the ultimate OS (Christopher B. Browne)
  Re: using C++ for linux device drivers ([EMAIL PROTECTED])

----------------------------------------------------------------------------

From: [EMAIL PROTECTED] (Joe Doupnik)
Crossposted-To: comp.os.linux.networking
Subject: Re: Nagel algorithm??
Date: 20 Jun 99 16:07:45 MDT

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Mike 
Jagdis) writes:
> In article <7kbfh8$esi$[EMAIL PROTECTED]>, bill davidsen wrote:
>>To repeat the original question, short of editing the config file by
>>hand, does this option appear in any of the "make *config" menus? I
>>said I was willing to do it by hand, but I thought it was originally in
>>a config menu. The "say N here" doesn't make much sense unless you get
>>the chance to say anything.
>>[...]
>>If I hadn't done enough network hacking to know there was such a thing
>>and why I wanted to use it, I probably would never have found it.
> 
> The no nagle config option no long exists and probably the rogue
> ifdef check should no longer exist either. Nagle is automatically
> turned off on a per socket basis if you set the TCP_NODELAY
> socket option on the socket. Turning nagle off globally is a
> bit drastic in most cases.
> 
>                               Mike
============
        It turns out there is plenty of misunderstanding about Nagle mode,
its control, its proper use. It has a famous problem while trying to do
good deeds of grouping up data. The problem is it can hold back a small
segment (one less than a full MSS size) while it awaits ACKs for all
previous data. The ACKs may be delayed (delayed ACKs in use by the receiver).
In that case the deadlock of waiting on ACKs and the receiver waiting on
more data that won't come so as to piggyback the ACK is broken only by
either arrival of new application data to fill out the segment or the
delayed ACK timer in the receiver fires. Often it is latter and things
progress at the rate of that timer (frequently 200ms for five things/sec).

        I have looked into the problem and have solved it. You may wish
to read about it in draft RFC 
        draft-doupnik-tcpimpl-nagle-mode-00.txt
whose title is "A new TCP transmission policy replacing Nagle mode."
It's easiest to grab this from 
        directory pub/misc on netlab1.usu.edu 
Also present in parallel directory 
        pub/misc/newpolicy.sources 
are source code changes for FreeBSD 3.2, Linux 2.2.5-15, and Solaris 7. The 
code changes are tiny and all in one spot. The doc has much more insight
to Nagle modes (yes, there is more than one kind) than any networking book. 

        For a thumbnail sketch of the solution the story is like this.
Nagle mode can hold back a small segment until ACKs arrive, as noted
above. That's rather strange policy: it attempts predicting there will
be more app data, so waiting is worthwhile and avoids tinygrams. Meanwhile
the receiver is also holding back its ACKs, expecting more data and thus
reducing the count of tinygrams. They can deadlock waiting on each other.
Eventually the very slow delayed ACK timer fires and breaks the deadlock.
        There is absolutely no way the TCP stack can predict that more
application data will follow immediately, nor can the receiver make a 
prediction that no more data will follow immediately. The TCP transmitter 
cannot know what the application will do next; that is fundamentally 
impossible. Only the app knows what it will do next. Thus attempts at
predictions will fail and create the deadlock, no matter how clever are
the heuristics. This paragraph needs to be writ large.
        The solution is hold back small segments until the application plus
o/s _tells_ the transmitter's TCP that there is no more data to follow. 
The apps do this already in cooperation with the operating system. That's
how the PUSH (PSH) bit gets set properly. There is no extra programming
effort needed; it already happens naturally. The draft RFC above explains
this in detail. There is no fruitless predicting, there is no deadlock
possible, things go fast, full segments are made whenever the app gives 
enough data, no app level controls are necessary nor wanted. The above 
draft also contains some measurements on the matter.
        To avoid sending small segments when the app writes small data
quantities then avoid sending small data quantities to the protocol stack
unless needed. Put simply, buffer app data into larger chunks. Below are 
some hints on how. 
        Unix folks can use fwrite() and fflush() in cooperation with 
setvbuf()/setbuffer(). This works very well indeed. It's just as fast as
immediate mode write() but it buffers as you wish. It's not whimpy. To use 
f-functions on a socket use fdopen() to create a FILE pointer from a file
descriptor. Plain write() causes only its data to be sent to the TCP stack
as-is and right-now, and what's delivered is what the network will try 
sending right now. fwrite() writes to large app buffer and that buffer is
flushed to the TCP stack when it fills and when fflush() is called. Very 
simple buffering is the _fast_ way, avoiding many tinygrams and extra system
calls which are slow. It's faster than write().
        I will say this in public. The teaching of network programming
has almost totally focused on using immediate-action i/o statments, like
write(), rather than using buffered-action i/o statements such as fwrite().
This is a serious blunder, a bit of blindness by those who should know
better. I was in this group until I figured out the deadlock situation.
        My new TCP transmission policy causes the network to do what the
application requests, as quickly as it can, automatically, and that is a 
welcome change from today's behavior. To ensure we don't overconsume 
resources we output data in big chunks. This saves host cpu cycles at both
ends, it saves network resouces. It is faster with and without a network, 
it is almost the same programming effort as write(). We need to educate 
programmers to buffer and conserve and as a benefit things will go faster
than otherwise, even to a local disk drive. Stay away from write(). Yes I
know, old habits die hard; but now we have no excuse to behave as of old.

        Turning off Nagle mode, the alternative to-date, leads to more
tinygrams. My "new policy" strongly groups when it can to send full segments.
The app can help that a lot at basically no cost by buffering well. We get 
the benefits of full segments and no deadlock. No network programming is
required. Turning off Nagle mode, today, is via a socket option, TCP_NODELAY,
applied via setsockopt(), or option FIONBIO to ioctl(). My new policy needs
none of this sockets control code. We should keep in mind that not all TCP/IP 
work uses sockets.
        If you have comments on my new TCP transmission policy replacing
Nagle mode please feel free to contact me directly at [EMAIL PROTECTED]
        Joe D.

------------------------------

From: [EMAIL PROTECTED] (Christopher B. Browne)
Crossposted-To:  alt.os.linux,comp.os.linux.advocacy,comp.os.misc,comp.unix.advocacy
Subject: Mainframes, Filesystems, Databases... Re: TAO: the ultimate OS
Reply-To: [EMAIL PROTECTED]
Date: Sun, 20 Jun 1999 22:26:37 GMT

On 20 Jun 1999 17:04:10 GMT, Terry Murphy <[EMAIL PROTECTED]> posted:
>In article <7kg38f$kbc$[EMAIL PROTECTED]>,
>Stefaan A Eeckels <[EMAIL PROTECTED]> wrote:
>
>>If you read something disparaging in my reference to mainframes,
>>then I urge to to read the quote again. Revisiting existing
>>concepts is not a step forward. It also is not necessarily a
>>bad thing.
>
>The issue of whether an OS should be mainframe-like or Unix-like
>is actually pretty interesting.  Mainframe OS'es had lots of features,
>but they were big and slow for their time. Unix came in and showed
>the world, that an OS did not have to be big, and stripped all
>"extraneous" functionality. Meanwhile, hardware grew, big time. In
>hindsight, though, it can be argued that the mainframe OS'es, with their
>insistence of a sheer number of features rather just elegance of
>simplicity (i.e. Unix), were ahead of their time. It was not until
>recently that they could be implemented to perform well. Windows NT is 
>much more of a mainframe class OS than Unix is -- however, it has been
>very influenced by Unix (e.g. files are streams of bytes), so it ends
>up falling into a somewhat in-between class, which is probably why a lot
>of people do not like it.

There's another view on mainframes that I'd suggest considering...

The *first* thing to *ignore* is the fact that they were big and slow.
Remember that the early UNIX boxes were also quite big and quite slow
too.  The latter puts the former into a bit better perspective.

The thing about mainframes that is (arguably) better than UNIX is the
fact that mainframe systems make *intensive* use of block I/O, with
minimal usage of streaming.

This has the effect that mainframe apps are encouraged to be aware of
the physical characteristics of such notable things as the sizing of
disk cylinders.  Files *tend* to represent groupings of blocks, and when
these are nearly natural physical sizes, this means that block-oriented
algorithms work very well.

Contrast this with the UNIX approach of "everything is a stream of bytes."
Optimizations to turn streams into blocks are possible, and even are often
implemented, but the processing of "streams of bytes" does come at a cost.

The other major effect seen on mainframes is that processing is
encouraged to turn into "block operations."  The good 'ol 3270 terminals
would collect up a bunch of data updates (perhaps typing in a page of
code), and would ship it all off to the mainframe as a set of blocks.
Contrast this with using curses, or, worse still, X, where practically
every time you touch the keyboard or mouse an interrupt is generated
and the "main host" has to do some processing work for that event.

On X, typing in a page of text could involve several thousand events,
each of which requires interaction between the terminal in front of
you and the host.  Over Ethernet, this adds up to thousands of packets,
and tens of thousands of bytes of data transferred and processed by the
various layers.

In contrast, while XEDIT may be pretty grotty in many ways, typing in a
page of text *probably* involves a single transmission of 80x24 bytes plus
a little bit of overhead as one block of data.  Less than 2000 bytes, all
bounced across the network as a single block.

UNIX equivalents include things like:
- Marshalling data locally for transmission to a TP monitor (e.g. - Tuxedo),
- Marshalling data in a web browser that gets sent via CGI as a response
to <FORM> </FORM>
- Marshalling data in an ORB.

CGI gets accused of being slow; this happens only when the design
continually respawns Perl processes that then load in bags of libraries
and do lots of repetitive initialization.

>>It was easier to port UNIX than to write an OS from scratch. The
>>fact that the principals of comapnies such as Sun cut their teeth
>>on UNIX at Berkeley (and liked it, I guess) was also a significant
>>factor. BTW, the people who did BSD and SunOS *are* professional
>>software engineers (that's at least what their degree says).
>
>>Today, hardware companies do not even dream of developing an OS, and
>>no longer consider porting UNIX. They design their hardware to be
>>compatible with Windows, and write drivers.
>
>Perhaps in your Microsoft-dominated fantasy world. The fact is there are
>no less than five Unix's being slated to run on IA-64 (off the top of
>my head, I can think of Tru64, Solaris, IRIX, HP-UX, and SCO...plus 
>Linux...I forget if AIX is in the works). I'm not even sure there 
>are that many major architectures which do NOT run Unix, besides some
>of the mainframe systems.

The contention has a *grain* of truth; MSFT has pushed big bucks into
discouraging hardware vendors from considering anything other than 
Windows.  The more gullible vendors have gone into "joint ventures"
that appear to benefit MSFT vastly more than the hardware partner.
(Witness the thrashing about of Digital and Sequent...)

>>NFS is not a great protocol, granted. Pretending that SMB is any 
>>better displays your profound lack of knowledge.
>
>I am not saying SMB is a great protocol. I am merely saying fundamental
>filesystem features such as mandatory file locking work in it (as well
>as DECnet), but not in NFS.

Probably substantial efforts should be going into newer protocols like
Coda or AFS; it is quite unfortunate that SMB and NFS both have nearly
crippling flaws as a result of needing to be backwards compatible with
MS-DOS (and I'm thinking back to the days before Windows here).

It is quite possible that they are flawed due to UNIX-related issues,
but it is more demonstrable that they have problems that result from
having been designed with a need to interoperate with MS-DOS, PC-DOS, and
other variants of the CP/M "clone."

>>So what? The basic UNIX file system is designed to store lots of small
>>files. When your average mainframe OS was designed, disk space was so
>>expensive small text files were stored on paper 
>
>I understand that, but this was 30 years ago. Today, most users (whatever
>that means) work with large, structured files such as Word documents,
>Access databases, Excel Spreadsheets, etc.  The era of the small file
>is over. And, as I noted above, we now have the hardware to support such
>use.

The folks working on Reiserfs are looking at an absolutely opposite tack
to this; they are trying to provide two things:
a) The ability to efficiently create and access very small files.
   This means that it makes sense to create a tiny file for every value
   that you might want to fiddle with.

Rather than having /etc/hosts, one might do something like:
# mkdir /etc/hosts
# echo 192.168.1.1 > /etc/hosts/dantzig
# echo 192.168.1.2 > /etc/hosts/wolfe
# echo 192.168.1.3 > /etc/hosts/tarjan
# ln dantzig dantzig.home.org
# ln wolfe wolfe.home.org
# ln dantzig cache
# ln wolfe nfsserver

which allows you to look up names as if they were files.

On a filesystem where the minimum space consumed by a file is 8K, this
would be a ludicrously stupid approach.  (Which suggests that the FS
implementation may have taken a ludicrously stupid approach...)

b) The ability to treat a directory of files as a file that may be
treated, when suitably "locked," as an atomic unit.

There are discussions going on with the SAMBA folks so as to try to
create this in a way that may allow a clearly superior API to be 
associated with the notion of "file forks."

>>There's no guarantee that an OS provided index-sequential file will
>>be the best support for a relational DBMS (in fact, it isn't). 
>>Similarly, the locking features of the OS might not be the best
>>choice for a DBMS (in fact, they aren't). 
>
>If the OS provides optimized record services, then they should be good
>for most tasks. I think that is a much cleaner solution than the Unix way,
>where everybody who needs a database implements it on their own. Each
>program has its own data, and the code to read that data is unique to
>the program, which IMHO, is very sloppy. The same is true of the Unix
>idea of config files (vs. various database configuration setups in most
>other OS'es).

It is fair to say that it would be a valuable thing to have some 
highly optimized "record services" available.  

On UNIX, people *tend* to think of DBM or Berkeley DB when this issue
comes up.  It is not an outrageous idea to assume that it is a
reasonable idea to do this "as a library."

It would be Really Neat to have something like what the Reiserfs folk
are working on where the library would provide a clear isomorphism
between the "record services API" and the "record services" that the
OS natively supports.  That way you take advantage of the FS services,
rather than hiding from them.

>>Do you really mean turning Oracle into a kernel module? Or do
>>you want to limit yourself to fixed-length records and index-
>>sequential? Oh yes, and if someone needs a structuring not
>>provided for by the OS, will you tell him to go and such eggs?
>
>I don't necesarrily mean putting it into a kernel, i.e. running in ring
>0. I am less concerned with the details of implementation than the 
>overall point: there should be a certain shared service to resolve these
>types of problems. In the case of RMS (Record Management Services) on VMS,
>it is not running in ring 0, but it is a service running lower down. 

Agreed.  

On UNIX, the appropriate location for such services would *clearly* be
a library.  

The "integrate with Reiserfs" idea encourages there to be a tight
integration between the library and the FS services provided at the
kernel level; if the API is decent, and implementation is competent,
hopefully coping with the situation where the particular FS is not
available would merely result in a minor degradation of Quality of
Service.

Note that given such an API of sufficiently good quality, it might
well make sense to implement an RDBMS as an application that uses
that library.  That appears to be the story of where MySQL came from;
they had a mature "data storage system," and built an SQL layer atop
it.  Given an even better "data storage system," it could well make
implementing a highly reliable/efficient RDBMS an easier task.

-- 
Those who do not understand Unix are condemned to reinvent it, poorly.  
-- Henry Spencer          <http://www.hex.net/~cbbrowne/lsf.html>
[EMAIL PROTECTED] - "What have you contributed to free software today?..."

------------------------------

From: [EMAIL PROTECTED] (Christopher B. Browne)
Crossposted-To:  alt.os.linux,comp.os.linux.advocacy,comp.os.misc,comp.unix.advocacy
Subject: Re: TAO: the ultimate OS
Reply-To: [EMAIL PROTECTED]
Date: Sun, 20 Jun 1999 22:57:58 GMT

On 20 Jun 1999 17:56:50 GMT, Terry Murphy <[EMAIL PROTECTED]>
posted:
>>the new priority: THE END USER. everything needs to revolve around
>>the end user, rather than the programmer. EASY TO PORT== favoring
>>the programmer. 
>
>I totally agree, of course. One of the problems you are going to
>encounter with this philosophy, I'm afraid, is management. 

This is an intractable problem.

You have programmers that don't understand the way the users think;
you also have users that aren't capable of framing what they want/need
in a way that is conceivably implementable.  And there are people that
control purse-strings that are a third group that understand neither
what is implementable nor what users truly want.

Combine these three perspectives together and it is remarkable that *any*
program ever gets released to the general public.

The "Linux phenomenon" represents the 'fourth situation;' it represents
software created by programmers for their own use.  As such, those
that write the software *do* know what they want/need, as well as how
they can implement it.

The notion that this can forcibly scale into a model that will provide
useful software for "dumb users" is questionable at best.  

The approach of "programmers creating software for programmers" has
clearly been extremely successful.  There are ludicrous quantities of
language compilers and interpreters, scripting languages, web servers,
and 'utilities in general.'  

On the other hand, many attempts to create a "user-friendly, powerful
word processor" have failed.

>However, the RISC wars of the '80's and '90's are over. Right now there
>is basically one architecture that really matters, and it will be
>really, really interesting to see what happens in the next 10 years to
>IA-64 which basically all systems makers (from desktop to enterprise)
>plan to embrace. If it is as successful and some predict, portability
>may hardly be a concern at all anymore.

This assumes that the Gartner reports indicating forcible dominance
of IA-64 will actually turn out to be true.

There is a severe problem with IA-64; it is only sold by Intel, and
Intel is evidently working hard to keep it proprietary, unlike the
situation with IA-32.  IA-32 became widespread because there were a
bunch of independent designs, which kept competition alive, discouraged
complacency, kept prices down, and such.

In contrast, if IA-64 is the only 64 bit design that survives, this
makes Intel an arguably more powerful organization than people accuse
MSFT of being.

Do Sun, Compaq, HP, IBM, and others want to be dependent on the good
graces of Intel to provide CPUs in the way that may of 'em have become
forcibly dependent on MSFT to provide OSes?

>You have been criticized for not creating a design document (and not
>having code, but that just empahizes my point above). The formal
>software process states that there is one document before the design
>document, called the requirements document, which is precisely what you
>wrote. My advice to you is to gather the people who are interested in
>the project, and come up with a design document. The formal software
>design process is difficult, especially if you don't have a manager
>hanging over you to produce it. I do think it will be difficult to
>produce a well designed system in the free software world, but it
>certainly can be done (GNOME, for example, seems pretty well designed).

Fair comment...
-- 
Those who do not understand Unix are condemned to reinvent it, poorly.  
-- Henry Spencer          <http://www.hex.net/~cbbrowne/lsf.html>
[EMAIL PROTECTED] - "What have you contributed to free software today?..."

------------------------------

From: [EMAIL PROTECTED]
Subject: Re: using C++ for linux device drivers
Date: Sun, 20 Jun 1999 23:45:14 GMT

Justin Vallon <[EMAIL PROTECTED]> wrote:
: [EMAIL PROTECTED] (Alexander Viro) writes:

:> In article <7kdqj9$l1o$[EMAIL PROTECTED]>,  <[EMAIL PROTECTED]> wrote:
:> >Hi all,
:> >
:> >     I am working a sound driver for linux (I will probably use OSS).I
:> >am planning to use C++, instead of C. Has anyone used C++ before for
:> >kernel/device driver programming for linux. If so what are the
:> >complications with using C++. I heard that C++ needs some OS support,
:> >especially for calls like "new", "delete" and stuff like that.
:> 
:> It will not get it. It's beaten to death many, many times. Oh, and forget
:> about try and catch - they are not going to work. Ditto for standard classes
:> - runtime environment is not available too.

: Too bad.  All you should need is:

: void *operator new(size_t s) { return malloc(s); /* kmalloc, etc */ }
: void operator delete(void *p) { free(p); }

: Compile with -nostdinc++ -fno-exceptions.

: Static constructors may need a C++ link phase, or you could warn that
: C++ static constructors will not be executed.


The main argument against C++ is still overhead.  Even when you turn off
many of the features, you still have overhead that is greater than C.

Check out EROS, currently being discussed on linux-kernel.  It is an OS
that _was_ written in C++, and the author thinks this was a bad
decision.

        Jeff






------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list (and comp.os.linux.development.system) via:

    Internet: [EMAIL PROTECTED]

Linux may be obtained via one of these FTP sites:
    ftp.funet.fi                                pub/Linux
    tsx-11.mit.edu                              pub/linux
    sunsite.unc.edu                             pub/Linux

End of Linux-Development-System Digest
******************************
Linux-Development-Sys Digest #851

Reply via email to