Re: efficiently keeping a short list

Jay Savage Tue, 11 Jul 2006 08:44:08 -0700

Yet another victim of the dreaded reply-to

On 7/11/06, Jay Savage <[EMAIL PROTECTED]> wrote:

On 7/11/06, Mr. Shawn H. Corey <[EMAIL PROTECTED]> wrote:
> Jay Savage wrote:


>
> No thank you, you have change the script from using seconds to fraction
> of seconds. If you had done this the first time around, my comments
> would be different (see below).
>

I did say that the first time around, or at least I tried to, but see
my follow-up.


>
> $id is no more likely to overflow than Time::HiRes::time. Perl is not C.
> In Perl, if an integer overflows, it is automatically convert to a
> float; no muss, no fuss. The problem you would have with both is
> dropping of insignificant digits. Consider the following:
>

[snip]

> Note the output is zero. This is because 1 is too insignificant to
> effect a number as big as 1e138. There is a whole field of mathematics
> called Numerical Analysis devoted to dealing with such problems.
>

There are also Math::BigInt and Math::BigFloat. But I think thee are
some other drawbacks to autoincrement. keeping order when an item
repeats becomes complicated, requiring linked lists, etc.

> But you are talking really big numbers here; you are more likely to
> experience memory overflow than this.
>

Probably so. Although overflowing anything with a list you expect to
hover in the neighborhood of 200 items is highly unlikely.

> >
> > Of course since OP didn't give any implementation details, it's
> > possible that the list could grow even larger between calls to
> > get_list(). Maybe the user will only want the list of 200 recent items
> > after an average of 1,000,000 elements are added. Then, sure, you'll
> > want to take a different approach. If that's the case, I'd recommend
> > getting a copy of _Mastering Algorithms With Perl_, and reading up on
> > linked lists and b-trees.
> >
>
> The solution I proposed was a double-linked list with a hash to every
> element. The hash is used to remove elements from the middle of the list.

I see that now, although at the time I only saw Charles' reply, which
was what I was responding to. Your porposal is an elegant solution.

> B-trees are only effective when used on disk. Compared to other
> algorithms you can use, b-trees are slow when used in RAM only. Even
> binary b-trees are slower than other algorithms you can use to balance a
> binary tree.
>

That call will be based on the data and the specififc implementation,
but if running out of RAM is a concern, writing to disk will be the
solution. I just suggested that if managing an ordered data set large
enough to be of concern was the issue here, that we were getting a bit
beyond the scope of this list, and that Orwant, et.al. was the logical
next step.

>
> > In my experience, though, it's standard to size the list slightly
> > larger than the number elements you expect to add between accesses.
> > For most UI situations, the working definition of "recent items" is
> > "the most items we think a normal user will access between views of
> > 'recent items.'" The value of a recent items list decreases rapidly
> > when users use more items in a session than the list can hold, and
> > demonstrably recent items get shifted off the bottom.
> >
>
> In Perl, you don't have to pre-allocate memory; again you are thinking
> too C-ish.
>

I don't know where I mentioned allocating memory. My comments weren't
about memory allocation, they were about UI design. I was talking
about the size of the "recent items list" presented to the user. I
think that the concern about RAM is an extreme corner case. If OP is
sizing his list at 200, it's presumably because he expects that to be
a reasonable number for normal use. Items shouldn't fall off the list
while they're still fairly 'recent'. If the 200 slots are filling up
and the list is being pruned frequently enough to cause performace
degradation--or the the list is growing large ienough to cause
degradation before being pruned--OP probably needs to rethink things
because things that can be reasonable thought of as "recent" won't
appear on the list. Let me phrase that more succinctly: if system
resources become an issue for any sort of "recently used" list,
something is almost certainly horribly wrong with the spec.

> I think that keeping a recent-items list of 200 to be silly. A normal
> user will not look beyond the first five (but will complaint unless the
> list holds at least ten items).
>

Exactly.

But to get this back to Perl, let me offer this optimization to my
original suggestion:

   sub get_list {
       my @sorted = sort {$recent{$b} <=> $recent{$a}} keys %recent;
       while (@sorted > $limit) {
           delete $recent{pop @sorted};
       }
       return @sorted;
   }



-- jay
--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org

values of β will give rise to dom!



--
--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org

values of β will give rise to dom!

Re: efficiently keeping a short list

Reply via email to