Racket can do this somewhat faster, but I suggest any effort be focused on improvements that are also relevant to substantial programs, and not on trying to compete on Perl one-liners and poor benchmarks.

Details follow...

Trying this 'benchmark' on a 700MB log file (just Linux "dmesg" output, duplicated many times), I saw somewhat comparable numbers with Racket 5.x as those on Stackoverflow. (This was on Linux on an old 2GHz laptop, no swap space, and the kernel had cached the 700MB in RAM buffer, so it was just Racket pegging a CPU core at 100%.)

Using a "regexp-match" was significantly faster than "read-bytes-line", but I'm sure still slower than the other languages mentioned.

The process size stayed at 40MB total (shared libraries and everything). It looked like there were near-constant quick GC cycles. GC tuning might help?

This would be a more useful benchmark if it required actually doing something plausible with the allocations, rather than immediately throwing them away and doing no actual processing. I suspect Racket would perform relatively better on something closer to a real-world task.

Were I writing high-performance I/O code, I might use "read-bytes-avail!", to try to reduce allocations. Of course, sys-admins would not be doing this for quick scripting Perl-like tasks. (Were we to max out what we can do with GC tuning and optimizations, we could always try making a minilanguage for this traditional Perl-like task, which optimized away some allocations, such as by allocating only text that we use.)

Matching Perl I/O performance would be nice, but I'm not disappointed if nobody does. Perl was originally developed for pretty much this exact task (i.e., going through a line-oriented text-ish data file, applying a regexp to each line) and to be fast even on a 16MHz 4MB Sun 3/50 of over 20 years ago.

Also, I think we discussed this a while ago (perhaps when making the few-liner examples for the new Web site), but I think that nobody will win over any Perl programmers by trying to get their language to do 20-year-old Perl one-liners. This program is a handful of characters in Perl, and telling people that they could be typing "lambda" and parentheses and such instead, and wouldn't that be so much better, makes one look like a crazy person. Focus on things that are *not* Perl one-liners, but are substantial programs -- especially ones that benefit from syntactic extension, functional-ish programming, and maintainability -- since that's where Racket becomes a smart tool of smart people, and where Perl becomes a burden of crazy people.

With that in mind, from a PR perspective, if a Perl-type person asks you, "What does this Perl one-liner look like in Racket?", the preferred responses are: (1) "That task looks like what Perl is good at"; (2) do as politicians do, and answer the question that you wish you had been asked; (3) pretend to speak only Swahili and to not understand the question.


Sam Tobin-Hochstadt wrote at 11/02/2011 07:14 PM:
On StackOverflow [1], someone reported that Racket's I/O performance
on large files was substantially worse than other languages for a
simple task.  I haven't yet tried it on a similarly large volume of
data, but I did see a performance difference relative to Chicken for
large but not huge files, and Ryan seems to have gotten similar
results.

[1] http://stackoverflow.com/questions/7946745/i-o-performance-in-mzscheme

--
http://www.neilvandyke.org/
_________________________________________________
 For list-related administrative tasks:
 http://lists.racket-lang.org/listinfo/dev

Reply via email to