> > > [Noam]
> > > > For example, iteration over a set which once had
> > > > 1,000,000 members and now has 2 can take 1,000,000 operations
every
> > > > time you traverse all the (2) elements.
> > >
> > > Do you find that to be a common or plausible use case?
> >
> > I don't have a concrete example in this minute, but a set which is
> > repeatedly filled with elements and then emptied by pop operations
> > doesn't seem to me that far-fetched.
> 
> It doesn't seem far-fetched, but I've not seen anything like it.

It's more far-fetched when fully spelled-out:

Build a VERY large list, ALMOST empty it with pop operations, then
iterate over it MANY times (enough to offset the cost of multiple resize
operations with their attendant memory allocator interactions and the
expensive block copies (cache misses are a certitude and each miss is as
expensive as a floating point divide)).

Also note, this example was not selected from a real-world use-case; it
was contrived for purposes of supporting an otherwise weak proposal.




> > > Would the case be improved by incurring the time cost of 999,998
tests
> > > for possible resizing (one for each pop) and some non-trivial
number
> of
> > > resize operations along the way (each requiring a full-iteration
over
> > > the then current size)?
> > >
> > I believe it would. It seems to me that those 999,998 tests take not
> > much more than a machine clock, which means about 1 milisecond on
> > todays computers. Those resize operations will take some more
> > miliseconds. It all doesn't really matter, since probably all other
> > things will take much more. I now run this code
> >
> Doing that while loop will take _longer_ with a constantly resizing
set.
> The only way that resizing a dict/set as it gets smaller will increase
> overall running speed is if iteration over the dict/set occurs
anywhere
> between 2-100 times (depending on the resizing factor)

Josiah is exactly correct.  The resize operations are enormously
expensive compared to the cost of an iteration.  You would have to do
the latter many times to make up for the costs of repeatedly downsizing
the set.




> > > Even if this unique case could be improved, what is the impact on
> common
> > > cases?  Would a downsizing scheme risk thrashing with the
> > > over-allocation scheme in cases with mixed adds and pops?
> > >
> > I think that there shouldn't be additional damage beyond those clock
> > ticks. The simple method I took from "Introduction to Algorithms"
> > works no matter what sequence of adds and pops you have.
> 
> You may get more memory fragmentation, depending on the underlying
> memory manager.

There's more a risk than fragmentation.  Trashing is a basic concern.
There is no way around it -- some combination of adds and pops always
triggers it when both upsizing and downsizing logic are present.  The
code in listobject.c works hard to avoid this but there are still
patterns which would trigger horrid behavior with a resize occurring
every few steps.



> > > Is there any new information/research beyond what has been obvious
> from
> > > the moment the dict resizing scheme was born?
> >
> > I wanted to say that there isn't any new information, and yet I
don't
> > think that I have to assume that everything in current Python is the
> > best that can be. All I did was finding another reason why a
> > downsizing scheme might be good, and posting it to ask if people
have
> > thought about it. If you have a document listing all the design
> > decisions that went into dict implementation, then please send it to
> > me and I won't ask about things that were already thought about.
> 
> See the source for dictobject.c and dictnotes.txt:
>
http://svn.python.org/view/python/trunk/Objects/dictobject.c?rev=39608&v
ie
> w=auto
> http://svn

Those are both good references.  

The code for general purpose dicts has been fine-tuned, reviewed, and
field-tested to a highly polished level.  It is at a point where most
attempts to improve it will make it worse-off.

There may be some room for development in special versions of
dictionaries for specific use cases.  For instance, it may be worthwhile
to create a version emphasizing size or access speed over insertion time
(using Brent's variation to optimizing search order).



Raymond

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to