Ah, I see that I was mistaken about the timing. Sorry about that. After a lot of fiddling around, I cam up with this faster form:
(defn countnl-lite [#^bytes buf] (areduce buf idx count (int 0) (if (= (clojure.lang.RT/aget buf idx) 10) (unchecked-add count 1) count))) Key points are initializing count to a primitive integer and directly calling clojure's aget to avoid an unnecessary integer cast. On my system: The unmodified countnl function takes ~ 180 msecs Without AOT compilation countnl-lite takes around 66 msecs With AOT compilation countnl-lite takes ~46 msecs The java method takes ~19 msecs. I've lost a factor of 2.25 somewhere and it makes me sad that I can't find it. I would be very interested if anyone could improve countnl-lite. --Robert McIntyre On Mon, Aug 30, 2010 at 8:41 PM, Alan <a...@malloys.org> wrote: > I think this misses the point. Of course java, c, and clojure will all > have roughly the same wall-clock time for this program, since it is > dominated by the I/O. You can even see that in the output from $ time > java Iterate: less than 0.5s was spent in user space, the rest was > spent in system code - that is, mostly doing I/O. > > The java version is a second faster as counted by the wall clock, and > this is unlikely to be a coincidence: tsuraan's timing data suggests > that the clojure program takes 80ms longer in each loop, and loops 10 > times. That comes out to 0.8 seconds, which is quite close to the > differential you observed when timing from the command line. > > On Aug 30, 1:38 pm, Robert McIntyre <r...@mit.edu> wrote: >> I don't know what the heck is going here, but ignore the time the >> program is reporting and just >> pay attention to how long it actually takes wall-clock style and >> you'll see that your clojure and >> java programs already take the same time. >> >> Here are my findings: >> >> I saved Iterate.java into my rlm package and ran: >> time java -server rlm.Iterate >> >> results: >> time java -server rlm.Iterate >> Wanted 16777216 got 16777216 bytes >> counted 65341 nls in 27 msec >> Wanted 16777216 got 16777216 bytes >> counted 65310 nls in 27 msec >> Wanted 16777216 got 16777216 bytes >> counted 66026 nls in 21 msec >> Wanted 16777216 got 16777216 bytes >> counted 65473 nls in 19 msec >> Wanted 16777216 got 16777216 bytes >> counted 65679 nls in 19 msec >> Wanted 16777216 got 16777216 bytes >> counted 65739 nls in 19 msec >> Wanted 16777216 got 16777216 bytes >> counted 65310 nls in 21 msec >> Wanted 16777216 got 16777216 bytes >> counted 65810 nls in 18 msec >> Wanted 16777216 got 16777216 bytes >> counted 65531 nls in 21 msec >> Wanted 16777216 got 16777216 bytes >> counted 65418 nls in 21 msec >> >> real 0m27.469s >> user 0m0.472s >> sys 0m26.638s >> >> I wrapped the last bunch of commands in your clojure script into a >> (run) function: >> (defn run [] >> (let [ifs (FileInputStream. "/dev/urandom") >> buf (make-array Byte/TYPE *numbytes*)] >> (dotimes [_ 10] >> (let [sz (.read ifs buf)] >> (println "Wanted" *numbytes* "got" sz "bytes") >> (let [count (time (countnl buf))] >> (println "Got" count "nls")))))) >> >> and ran >> (time (run)) at the repl: >> >> (time (run)) >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 183.081975 msecs" >> Got 65894 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 183.001814 msecs" >> Got 65949 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 183.061934 msecs" >> Got 65603 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 183.031131 msecs" >> Got 65563 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 183.122567 msecs" >> Got 65696 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 182.968066 msecs" >> Got 65546 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 183.058508 msecs" >> Got 65468 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 182.932395 msecs" >> Got 65872 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 183.074646 msecs" >> Got 65498 nls >> Wanted 16777216 got 16777216 bytes >> "Elapsed time: 187.733636 msecs" >> Got 65434 nls >> "Elapsed time: 28510.331507 msecs" >> nil >> >> Total running time for both programs is around 28 seconds. >> The java program seems to be incorrectly reporting it's time. >> >> --Robert McIntyre >> >> On Mon, Aug 30, 2010 at 4:03 PM, tsuraan <tsur...@gmail.com> wrote: >> > Just to try to see if clojure is a practical language for doing >> > byte-level work (parsing files, network streams, etc), I wrote a >> > trivial function to iterate through a buffer of bytes and count all >> > the newlines that it sees. For my testing, I've written a C version, >> > a Java version, and a Clojure version. I'm running each routine 10 >> > times over a 16MB buffer read from /dev/urandom (the buffer is >> > refreshed between each call to the newline counting function). With >> > gcc -O0, I get about 80ms per 16MB buffer. With gcc -O3, I get ~14ms >> > per buffer. With javac (and java -server) I get 20ms per 16MB buffer. >> > With clojure, I get 105ms per buffer (after the jvm warms up). I'm >> > guessing that the huge boost that java and gcc -O3 get is from >> > converting per-byte operations to per-int ops; at least that ~4x boost >> > looks like it would come from something like that. Is that an >> > optimization that is unavailable to clojure? The java_interop doc >> > makes it sound like java and clojure get the exact same bytecode when >> > using areduce correctly, so maybe there's something I could be doing >> > better. Here are my small programs; if somebody could suggest >> > improvements, I'd appreciate them. >> >> > iterate.clj: >> >> > (set! *warn-on-reflection* true) >> > (import java.io.FileInputStream) >> >> > (def *numbytes* (* 16 1024 1024)) >> >> > (defn countnl >> > [#^bytes buf] >> > (let [nl (byte 10)] >> > (areduce buf idx count 0 >> > (if (= (aget buf idx) nl) >> > (inc count) >> > count)))) >> >> > (let [ifs (FileInputStream. "/dev/urandom") >> > buf (make-array Byte/TYPE *numbytes*)] >> > (dotimes [_ 10] >> > (let [sz (.read ifs buf)] >> > (println "Wanted" *numbytes* "got" sz "bytes") >> > (let [count (time (countnl buf))] >> > (println "Got" count "nls"))))) >> >> > Iterate.java: >> >> > import java.io.FileInputStream; >> >> > class Iterate >> > { >> > static final int NUMBYTES = 16*1024*1024; >> >> > static int countnl(byte[] buf) >> > { >> > int count = 0; >> > for(int i = 0; i < buf.length; i++) { >> > if(buf[i] == '\n') { >> > count++; >> > } >> > } >> > return count; >> > } >> >> > public static final void main(String[] args) >> > throws Throwable >> > { >> > FileInputStream input = new FileInputStream("/dev/urandom"); >> > byte[] buf = new byte[NUMBYTES]; >> > int sz; >> > long start, end; >> >> > for(int i = 0; i < 10; i++) { >> > sz = input.read(buf); >> > System.out.println("Wanted " + NUMBYTES + " got " + sz + " bytes"); >> > start = System.currentTimeMillis(); >> > int count = countnl(buf); >> > end = System.currentTimeMillis(); >> > System.out.println("counted " + count + " nls in " + >> > (end-start) + " msec"); >> > } >> >> > input.close(); >> > } >> > } >> >> > iterate.c: >> >> > #include<sys/types.h> >> > #include<sys/stat.h> >> > #include<sys/time.h> >> > #include<stdlib.h> >> > #include<unistd.h> >> > #include<stdio.h> >> > #include<fcntl.h> >> >> > int countnl(char *buf, int sz) >> > { >> > int i; >> > int count = 0; >> > for(i = 0; i < sz; i++) { >> > if(buf[i] == '\n') { >> > count++; >> > } >> > } >> > return count; >> > } >> >> > int main() >> > { >> > int fd = open("/dev/urandom", O_RDONLY); >> > const int NUMBYTES = 16*1024*1024; >> > char *buf = (char*)malloc(NUMBYTES); >> >> > int sz; >> > struct timeval start, end; >> >> > int i; >> > for(i = 0; i < 10; i++) { >> > sz = read(fd, buf, NUMBYTES); >> > printf("Wanted %d bytes, got %d bytes\n", NUMBYTES, sz); >> > gettimeofday(&start, 0); >> > int count = countnl(buf, sz); >> > gettimeofday(&end, 0); >> > printf("counted %d nls in %f msec\n", count, >> > (float)(end.tv_sec-start.tv_sec)*1e3 + >> > (end.tv_usec-start.tv_usec)/1e3); >> > } >> >> > free(buf); >> > close(fd); >> > return 0; >> > } >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "Clojure" group. >> > To post to this group, send email to clojure@googlegroups.com >> > Note that posts from new members are moderated - please be patient with >> > your first post. >> > To unsubscribe from this group, send email to >> > clojure+unsubscr...@googlegroups.com >> > For more options, visit this group at >> >http://groups.google.com/group/clojure?hl=en >> >> > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en