Linux-Development-Sys Digest #956

Digestifier Tue, 13 Jul 1999 08:26:42 -0700
Linux-Development-Sys Digest #956, Volume #6     Tue, 13 Jul 99 12:13:55 EDT

Contents:
  Re: when will Linux support > 2GB file size??? (Byron A Jeff)
  Re: Kernel version 2.3.9+ (Mark Tranchant)
  Re: NT to Linux port questions (Matthew Carl Schumaker)
  Re: performance of memcpy on Linux (Dale Pontius)
  Re: Dos4GW ->Linux (Mats Liljegren)
  Re: NT to Linux port questions (Peter Samuelson)
  Re: NT to Linux port questions (Jan Wielemaker)
  Re: when will Linux support > 2GB file size??? (Malcolm Beattie)
  Re: performance of memcpy on Linux (Maciej Golebiewski)

----------------------------------------------------------------------------

From: [EMAIL PROTECTED] (Byron A Jeff)
Crossposted-To: comp.os.linux.advocacy
Subject: Re: when will Linux support > 2GB file size???
Date: 13 Jul 1999 08:08:09 -0400

In article <[EMAIL PROTECTED]>,
Robert Krawitz  <[EMAIL PROTECTED]> wrote:
[EMAIL PROTECTED] (Byron A Jeff) writes:
-
-> In article <[EMAIL PROTECTED]>,
-> Rowan Hughes <[EMAIL PROTECTED]> wrote:
-> -In article <7m8qtt$[EMAIL PROTECTED]>, Byron A Jeff wrote:
-> -  [snip]
-> ->So BTW why exactly do you need 2GB+ files?
-> -
-> -They're needed more and more. In 6-12 months 50GB IDE disks
-> -with media speeds of 30MB/sec will be the norm. I'm using
-> -this sort of H/W already at work (GIS type stuff) and Linux
-> -would get a lot more use in this field if it could do >2GB files.
-> 
-> I see your point, to a point. I still see this as a somewhat minor
-> inconvenience to one segment of the developer population at the risk
-> of destabilizing everything for everyone. The limit is only at the
-> file level and for good reason which is that 32 bit machines
-> naturally only represent ints up to 2G (with another 2G for negative
-> indices).
-
-I don't consider this a good reason.  CP/M was never limited to
-256-byte files (it ran on 8-bit processors), and for that matter, it
-wasn't limited to 64K files either.  DOS certainly wasn't.  GCC
-supports 64-bit integers, generating the necessary instructions for
-working with them.

True. However 2G handles a great percentage of the files on the system whereas
8 or 16 bits do not.

-            
-            If we move everyone to an unnatural size size to simplify
-> things for a few applications, we risk the danger of reducing the
-> stability and performance of a filesystem that handles a great
-> majority of the file/application population in its current state.
-
-Modifying the existing filesystem layout is indeed risky, but how is
-creating a new filesystem risky in this regard?

It's not. Every argument I've heard so far is retrofit. I think a new FS is
in order for this type application.

-
-> Has anyone thought of writing a large file interface or class for
-> this type of activity? It seems to me its a relatively minor
-> adjustment to map a set of files in size up to 2G over a larger
-> array. Or maybe a filesystem specifically designed for handling
-> large files?
-
-There already is a de facto standard for handling large files, which
-Solaris and AIX support.  It's backward compatible, but not forward
-compatible: the normal filesystem operations cannot operate on a file
-larger than 2 GB, but there are 64-bit versions of all of the
-file-related system calls (or at least open() and friends) that can.

Excellent! That's exactly what I wanted to see.

-
-> I'd just hate to see the elegance and power of the existing
-> filesystem be compromised by fulfilling the needs of a few when
-> other relatively simple avenues exists to solve the problem.
-
-How would the filesystem be compromised?

Performance. See my other post for details.

-
-> Of all the examples I've seen only databases seem to warrant such a
-> significant change. And databases could in fact have better
-> performance be implementing a rudimentary filesystem for the data on
-> top a the raw partition.
-
-Actually, it's more an issue with processing large flat files than
-with databases, for precisely this reason.
-
-> Consider this: If a switch to a 64 bit filesystem occurs then every 
-> application that does a seek on that filesystem must be recompiled.
-
-More precisely, every application that tries to access a file greater
-than 2 GB.  That's true, but the change isn't difficult, and there's
-already a standard way to do it.
-                                                                      
-                                                                      Every
-> data block pointer in every inode will double in size. Every data block
-> computation will require 64 bit arithmatic.
-
-Sure, but so what?

It impacts the performance of the overall filesystem. One of ext2 greatest
features is that it is fast. It just doesn't make sense to impact every
process accessing the filesystem so that a couple of programs can have 2G+
access at a performance loss. 

Simply create a new FS that supports 2G+ files.

BAJ

------------------------------

From: Mark Tranchant <[EMAIL PROTECTED]>
Subject: Re: Kernel version 2.3.9+
Date: Tue, 13 Jul 1999 07:50:50 +0100
Reply-To: [EMAIL PROTECTED]

How about putting a *big* warning in the makefile?

belshazzar:> make bzImage

*******************************************
*                                         *
* WARNING! 2.3.x kernels are development  *
* kernels. They may not compile correctly *
* or may fail when booted, possibly       *
* causing severe data corruption - and    *
* no-one will have any sympathy!          *
*                                         *
* YOU HAVE BEEN WARNED! PRESS CTRL-C TO   *
* STOP OR ANY OTHER KEY IF YOU THINK      *
* YOU'RE HARD ENOUGH TO CONTINUE!         *
*                                         *
*******************************************

Mark.

Philipp Thomas wrote:
> 
> On Sun, 11 Jul 1999 21:35:59 -0400, "Zachary Kuznia"
> <[EMAIL PROTECTED]> wrote:
> 
> >In the 2.3.9 and 2.3.10 kernels I have found an error compiling the fat
> >module used to mount fat16 and fat32 drives.  It finds an unresolved
> 
> Jeez, has all the discussion about the changes in buffer cache that
> happened in *2.3.7* run past you ? Fat and other filesystems will
> continue to not work for some time because they haven't been adapted
> yet to the new scheme.  If you need fat, stick to 2.3.6 and wait for
> the changes to take place.
> 
> <SOAPBOX>
> I just can't understand why anybody would try to use a development
> kernel without *first* trying to get info on what works and what not.
> Or at least try searching on dejanews *before* asking on the lists or
> newsgroups.
> 
> After all, it's not called development kernel for nothing. Why do some
> people still think that everything in a development kernel should work
> ?
> </SOAPBOX>
> 
> Philipp
> 
> --
> Close the windows! The penguin is freezing.

------------------------------

From: Matthew Carl Schumaker <[EMAIL PROTECTED]>
Subject: Re: NT to Linux port questions
Date: Tue, 13 Jul 1999 09:06:30 -0400



Matthew Carl Schumaker
UPAC Lights Administrative Chairperson
[EMAIL PROTECTED]
veni, vedi, velcro
I came, I saw, I stuck around

> > Precisely my point, When I was working on the project, the company didn't
> > want to change their source so I had to implement the windows calls under
> > Linux, not pretty at all
> 
> So it started life tied to Windows, and you expect that it's suddenly
> going to trivially run under Linux without modification?  Use winelib
> if you must, or just do the port properly in the first place.
Not all aspects of windows are support under Wine, they even admit it.
the App I was porting made use of the winsock2 library which isn't
supported.  In the end I convinced them to let me do a rewrite of the app.
The code was about 3/4 the size and could handle almost 2x the number of
connections

> > True but not all of these handle are univerisal, in MS there is a data
> > type HANDLE that is used for any kind of handle let it be a file, window,
> > socket, device, etc
> 
> I don't know what "universal" means here, but a file descriptor
> referring to a disk file is no different from one referring to a
> socket or a device.  That seems pretty generic to me.
True but in MS the HANDLE type can refer to anything
file,devices,events,semaphore,windows,

> > THe MS WaitForMultipleEvents has the ability to be called so it doesn't
> > return unless ALL handles become signaled SIMULTANEOUSLY select can't do
> > this.
> 
> So you want something that won't return until ALL of the objects
> become ready?  I would think that you'd want something to return as
> soon as ANYTHING became ready.  It's not too hard to emulate this in
> any event: keep calling select() on everything that isn't yet ready
> until everything has finally reported in.  Put it in a library
> routine, and nobody will be any the wiser.

My above statment is a valid setting to WaitForMultipleObjects
however you can set it to return when any of the events become true

>          
>          Not to mention using either select() or poll() heavily in program
> > can have very adverse effects on the speed of the program, and btw
> > select() and poll() are both blocking calls.
> 
> This is simply false.  select() can take a timeout, and you can
> specify a timeout of 0 or anything less than 1E8 seconds (I have no
> idea where that silly number came from).  As for efficiency, try
> measuring it before claiming that it will hurt performance.

But doesn't exectution of your routine halt while you call select or poll?
This is blocking in my book.  The Timeout just specifies a maximum amount
of time that it will block
 
Don't believe me, look at the paper from this site, is June 1999 recent
enough?

titled: A scalable and explicit event delivery mechanism for UNIX,

http://www.cs.rice.edu/~gaurav/papers/index.html





------------------------------

From: [EMAIL PROTECTED] (Dale Pontius)
Subject: Re: performance of memcpy on Linux
Date: 13 Jul 1999 13:51:13 GMT

In article <[EMAIL PROTECTED]>,
        Maciej Golebiewski <[EMAIL PROTECTED]> writes:
> Dear All,
>
> Recently I have noticed strange behaviour: caling memcpy
> with longer chunks of data actually delivers worse
> bandwidth.
...
>
> The big question is:
> - is it caused by the way Linux kernel manages memory?
> - is this caused just by cache effects?

This looks to me like cache effects. You're effectively probing
the difference between L2 and main memory bandwidth. It also
indicates that your PPro has more L2 than your PII. Look at the
small block transfers on the PII and you'll see the whole problem
starting to fit into L1.

Dale Pontius
(NOT speaking for IBM)

------------------------------

From: Mats Liljegren <[EMAIL PROTECTED]>
Subject: Re: Dos4GW ->Linux
Date: Tue, 13 Jul 1999 16:42:14 +0200

> I do not know about Linux too much. We are looking to change a platform from
> DOS+Dos4w(Watcom C++) to something else.
> The main requirements are:
> 1. Possibility to make small enough OS kernel
> 2. Disabled page swapping
> 3. OS binary size all: Max 5MB
> 
> Our task is developing end-user navigational systems based on x86 platform
> having power-on, power-off button and some more(like game machine or TV).
> 
> If anyone can give me information about possibility of using Linux, please
> do. I already found very big Linux community and very interesting sources
> like RTLinux or GGI project, but still need a advise from some Linux guru.
> If you have information Please send directly to my email

I'd advise you to look at single-floppy distributions, look at:
http://mulinux.nevalabs.org/
That is one such distribution. If you look at the page under the heading
"Related Projects", you will find loads of others. Most of them are only
repackaging existing utilities, but you can find inspiration from them.
Such as which commands that could be implemented with scripts instead,
which utilities that does the most with as little space as possible,
etc.

Most of those distributions targets a complete system size (when
compressed) of 1.44KB or 1.722KB. You should have 8MB RAM when using
them, but you can probably do with 4MB with some tweaking.

The kernel is usually around 400-600KB if you don't have too much fancy
stuff. In my experience, the big problem is libc... Most programs use
it, but the original library is like 7MB uncompressed binary... Most of
the distributions has some sort of minimized libc which only offers the
most used functions. Programs that use anything else will crasch with
such an libc, so you probably want to see what is available in the
minimized libc and compare with your programs.

Good luck!

/Mats

------------------------------

From: [EMAIL PROTECTED] (Peter Samuelson)
Subject: Re: NT to Linux port questions
Date: 13 Jul 1999 08:34:24 -0500
Reply-To: Peter Samuelson <[EMAIL PROTECTED]>

[Matthew Carl Schumaker <[EMAIL PROTECTED]>]
> In the end I convinced them to let me do a rewrite of the app.  The
> code was about 3/4 the size and could handle almost 2x the number of
> connections

(:

> > So you want something that won't return until ALL of the objects
> > become ready?  I would think that you'd want something to return as
> > soon as ANYTHING became ready.
> My above statment is a valid setting to WaitForMultipleObjects
> however you can set it to return when any of the events become true

OK but it's still easy to emulate "wait for all events" in select() if
you really need it -- but I can't think why you would.

> > select() can take a timeout, and you can specify a timeout of 0 or
> > anything less than 1E8 seconds
> But doesn't exectution of your routine halt while you call select or
> poll?  This is blocking in my book.  The Timeout just specifies a
> maximum amount of time that it will block

*OF COURSE* execution of your routine halts while you call select or
poll.  It halts whenever you call *any* system call.  If *that's* what
you call blocking, there is *no* such thing as a non-blocking system
call.

If you call select() with a timeout of 0, that is non-blocking in *my*
book.

> titled: A scalable and explicit event delivery mechanism for UNIX,

It is true that select() and poll() have their scalability issues.  I
can't remember what they all are, but they are known.  VMS-style
completion ports are one proposed solution; I don't know what Linux has 
by way of an equivalent, though NT is said to have them.  ISTR someone
saying Linux had some sort of equivalent functionality (something to do 
with POSIC RT signals, perhaps?).

-- 
Peter Samuelson
<sampo.creighton.edu!psamuels>

------------------------------

From: [EMAIL PROTECTED] (Jan Wielemaker)
Subject: Re: NT to Linux port questions
Date: 13 Jul 1999 13:47:39 GMT

On Tue, 13 Jul 1999 09:06:30 -0400, Matthew Carl Schumaker
<[EMAIL PROTECTED]> wrote:

>> > Precisely my point, When I was working on the project, the company didn't
>> > want to change their source so I had to implement the windows calls under
>> > Linux, not pretty at all
>> 
>> So it started life tied to Windows, and you expect that it's suddenly
>> going to trivially run under Linux without modification?  Use winelib
>> if you must, or just do the port properly in the first place.
>Not all aspects of windows are support under Wine, they even admit it.
>the App I was porting made use of the winsock2 library which isn't
>supported.  In the end I convinced them to let me do a rewrite of the app.
>The code was about 3/4 the size and could handle almost 2x the number of
>connections

Unless aiming at binary compatibility, as wine is doing, I think
emulation of one API in the other is not really a good idea.  The
win32 and Unix API are very different in spirit in many aspects.  In
general however, an application's demand is often implemented quite
naturally.

Starting from this point of view, the correct way is to identify how and
where your application wants to talk to the OS.  Concentrate this into
seperate modules and reimplement these modules for the different OS'es.

Consider the view on files as an example.  If your app needs to know the
size of a file somewhere define size_of_file().  It is easy to write in
both OS'es, but emulating Win32 filehandling in Unix is about as hard
(and still incomplete) as the other way around.

        Cheers --- Jan

------------------------------

From: [EMAIL PROTECTED] (Malcolm Beattie)
Crossposted-To: comp.os.linux.advocacy
Subject: Re: when will Linux support > 2GB file size???
Date: 13 Jul 1999 13:13:42 GMT

In article <7mf9rt$[EMAIL PROTECTED]>,
Byron A Jeff <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>,
>Robert Krawitz  <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] (Byron A Jeff) writes:
>-
[...]
>-
>-> ext2 is very good at what it does the most. Adding 2G+ support will make it
>-> less good at what it does the most. So why change it?
>-
>-Well, OK, why would it make it less good?
>
>Because switching all the file pointer computations to 64 bit will slow each
>and every reference to the file system down on a 32 bit machine.
>
>It was natural to do on 64 bit architechures because they naturally support
>64 bit arithmatic. But 32 bit architectures don't.

You're missing something. As far as an ordinary filesystem is
concerned, almost all the arithmetic is done on block numbers and
sector numbers. It maps between an (inode, block-within-object)
pair and a (block-device-object, sector-index) pair. There's a bit
of arithmetic keeping track of the file's size and each struct
file's position but that's *already* done as "long long" arithmetic
and isn't performance critical. (OK, so Stephen's just found and
fixed a load of bugs in bad 32-bit <-> 64-bit assumptions but that's
nothing fundamental to the filesystem design.) So as far as ext2 is
concerned (or any ordinary filesystem) the current design limit is
2TB, not 2GB (i.e. 32-bits worth of 512-byte sectors of the
underlying block device).

The 2GB restrictions come from two other places: the mm system and
the page cache. The page cache is indexed by (inode, byte-offset)
where byte-offset is an unsigned long. That's what makes the
restriction of ordinary filesystems on 32-bit systems: anything that
uses generic_file_read so that it automatically uses the page cache
inherits a 4GB filesize restriction when sizeof(unsigned long) == 32.
The restriction to 2GB is presumably due to signed/unsigned issues in
various places. A reworking of the page cache which indexed by
(inode, page number) would get rid of that limit but it's a central
feature and it's tied in with other support functions which makes it
a difficult job to do safely and cleanly. If I recall, there was at
least some intention at one time to do it, though.

There's also the problem that you certainly can't mmap() the whole
of a >4GB file into only 32-bits of address space (minus 1GB or more
of address space that the kernel substracts for its mapping). Only
*then* do the libc, binary compatibility and foo64() issues kick in
(and maybe the VFS though a quick look at 2.2 seems to show it's
mostly if not completely done) and they are far removed from the
particular filesystem itself. Having said that, IMHO the Large File
Summit foo64() "solution" based on ILP32+off64 is Evil and Bad and
Wrong but I think that battle has been lost already.

--Malcolm

-- 
Malcolm Beattie <[EMAIL PROTECTED]>
Oxford University Computing Services
"I permitted that as a demonstration of futility" --Grey Roger

------------------------------

From: Maciej Golebiewski <[EMAIL PROTECTED]>
Subject: Re: performance of memcpy on Linux
Date: Tue, 13 Jul 1999 17:24:09 +0200

Maciej Golebiewski wrote:
> I think that those 70 MB/s were caused by -some-of-data- being in
> cache and probably also by the caching algorithm that somehow
> performed better in the case of many short memcpys. I repeated
> tests making sure that I use non-cached data, and this time the
> results were consistent with those reported by lmbench.

Shame on me!!!!!! I'm very sorry guys for wasting your time.
I just went once again through my original code that prompted
me to post on usenet, and to my horror I have discovered a bug
whose side effect was increased cache locality and decreased
number of cache misses. This bug would kick in after the first chunk
(in multiple chunk transfers) and the shorter chunk the faster
bugs kick in and the "better performance"

Should look more carefully after my pointers in future... :(

Sorry again for wasting your time and bandwidth.

Maciej

------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list (and comp.os.linux.development.system) via:

    Internet: [EMAIL PROTECTED]

Linux may be obtained via one of these FTP sites:
    ftp.funet.fi                                pub/Linux
    tsx-11.mit.edu                              pub/linux
    sunsite.unc.edu                             pub/Linux

End of Linux-Development-System Digest
******************************
Linux-Development-Sys Digest #956

Reply via email to