On 2008.04.25 16:32:53 -0700, Jason Dagit <[EMAIL PROTECTED]> scribbled 5.7K 
characters:
>    On Fri, Apr 25, 2008 at 3:53 PM, Gwern Branwen <[EMAIL PROTECTED]> wrote:
>
>      On 2008.04.25 13:42:04 -0700, Jason Dagit <[EMAIL PROTECTED]> scribbled 
> 1.9K characters:
>      >    On Fri, Apr 25, 2008 at 1:26 PM, Don Stewart <[EMAIL PROTECTED]> 
> wrote:
>      >
>      >      gwern0:
>      >      > Fri Apr 25 16:01:53 EDT 2008  [EMAIL PROTECTED]
>      >      >   * fpstring.c: switch a memchr for memrchr
>      >      >   See <http://bugs.darcs.net/issue814>; memrchr speeds up 
> is_funky quite a bit and
>      thus
>      >      helps whatsnew -s. It doesn't seem to break (any more) tests.
>      >
>      >      memrchr isn't available on BSDs.
>      >
>      >    Furthermore, I didn't see enough tests in Gwern's bug update to 
> convince me that the
>      >    difference in run-times is attributable to the use of memrchr.
>      >
>      >    Jason
>
>      I'm not sure how I could've tested it any more: I used a fresh darcs 
> repo, made the simple
>      change, compiled and installed, and tested it on a 9.5GB file 3 or 4 
> times, each time
>      getting a roughly 40 or 50 second result from 'time'; I then in the same 
> repo tested three
>      or four times using my system's darcs binary (or installing from one of 
> my other darcs darcs
>      repos) and get roughly 1m30s timings.
>
>      A single change, which leads to an expected difference multiple times; 
> what did I miss?
>
>    I was commenting on this:
>    http://bugs.darcs.net/msg4338
>
>    Which doesn't indicate (or if it does I'm overlooking it) that you ran 
> each test multiple
>    times, either to get the standard deviation or an intuitive sense of it.

Well, I went for the median, specifically.

>    Looking that over, I see this as the time it takes to run without changes:
>    (; darcs whats -s; ) 2.03s user 4.43s system 7% cpu 1:27.42 total
>
>    I'll just assume then that this timing is representative.
>
>    And then I see in your previous email that the times range from ~53 
> seconds to ~68 seconds.
>    That's a pretty good improvement in time although it's for a 9.5 GB file 
> so on smaller files
>    it may be negligible.  But, yeah I'm all for this change.  I think I would 
> prefer a change
>    where we don't read more than 1 page of memory (remembering that we mmap 
> files) looking for
>    the magic bytes.  If we look at, say 4k, of a file and don't see funky 
> bytes, probably darcs
>    should give up so that we don't cause the OS to swap everything out.
>
>    Jason

That's a workable solution, but I think I already suggested that we simply 
limit how much of the string we look at. This is fundamentally a trade-off that 
I think only Dr. Roundy can answer: is it alright to risk increasing the false 
negative rate for is_binary, for even a potentially significant space-time 
improvement?

Until we hear back from him, all we can do is tweak things and preserve the 
current behaviour of inspecting each byte in the file, and there our options 
are limited - basically changing a memchr to memrchr (or a rewrite of memrchr 
if we want to be portable), changing the two loops to a single loop (see my 
other email to dons), or possibly doing something funky like inlining some asm.

--
gwern
E.T. CDA SAI LEETAC Hague 50BMG Meade SIGDASYS guest 26

Attachment: pgpj9mWVWW3RK.pgp
Description: PGP signature

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to