Re: [Boston.pm] transposing rows and columns in a CSV file

Ben Tilly Sat, 13 Nov 2004 11:41:33 -0800

On Fri, 12 Nov 2004 23:04:46 -0500, Aaron Sherman <[EMAIL PROTECTED]> wrote:
> On Fri, 2004-11-12 at 13:22 -0800, Ben Tilly wrote:
[...]
> > Um, mmap does not (well should not - Windows may vary) use any
> > RAM
> 
> You are confusing two issues. "using RAM" is not the same as "allocating
> process address space". Allocating process address space is, of course,
> required for mmap (same way you allocate address space when you load a
> shared library, which is also mmap-based under Unix and Unix-like
> systems). All systems have to limit address space at some point. Linux
> does this at 3GB up to 2.6.x where it becomes more configurable and can
> be as large as 3.5, I think.


How was I confusing issues?  What I meant is that calling mmap does
not use significant amounts of RAM.  (The OS needs some to track
that the mapping exists, but that should be it.)  Once you actually use
the data that you mmapped in, file contents will be swapped in, and
RAM will be taken, but not until then.

As for a 3 GB limit, now that you mention it, I heard something
about that.  But I didn't pay attention since I don't need it right now.
I've also heard about Intel's large addressing extensions (keep 2GB
in normal address space, page around the top 2 GB, you get 64 GB
of addressible memory).  I'm curious about how (or if) the two can
cooperate.

> To be clear, though, if you had 10MB of RAM, you could still mmap a 3GB
> file, assuming you allowed for over-committed allocation in the kernel
> (assuming Linux... filthy habit, I know).

Exactly what I was referring to.

However the over-committed allocation comment confuses me.
Why would a single mmap result in over committing memory?

> > mmap should not cause any more or less disk accesses than
> > reading from the file in the same pattern should have.  It just lets
> > you do things like use Perl's RE engine directly on the file
> > contents.
> 
> Actually, no it doesn't as far as I know (unless the copy-on-write code
> got MUCH better recently).

Where does a write happen?  I was thinking in terms of using the
RE engine (with pos) as a tokenizer.

I was thinking that you'd use something like Sys::Mmap's mmap
call directly so that there is a Perl variable that Perl thinks is a
regular variable but which at a C level has its data at an mmapped
location.  Fragile, I know (because Perl doesn't know that it cannot
reallocate the variable), but as long as you are careful to not cause
it to be reallocated or copied, there should be no limitations on
what you can do.

> Like I said, you probably won't get the win out of mmap in Perl that you
> would expect. In Parrot you would, but that's another story.

In Perl I'd expect it to be possible but fragile.  If Parrot could make
it possible and not fragile, that would be great.

Cheers,
Ben
_______________________________________________
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] transposing rows and columns in a CSV file

Reply via email to