from:"Jake McGuire"

[Python-Dev] What to intern (e.g. func_code.co_filename)?

2010-02-13 Thread Jake McGuire

Has anyone come up with rules of thumb for what to intern and what the
performance implications of interning are?

I'm working on profiling App Engine again, and since they don't allow
marshall I have to modify pstats to save the profile via pickle.
While trying to get profiles under 1MB, I noticed that each function
has its own copy of the filename in which it is defined, and sometimes
these strings can be rather long.

Creating a code object already interns a bunch of stuff; argument
names, variable names, etc.  Interning the filename will add some CPU
overhead during function creation, should save a decent amount of
memory, and ought to have minimal overall performance impact.

I have a local patch, but wanted to see if anyone had ideas or
experience weighing these tradeoffs.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-23 Thread Jake McGuire

On Fri, Jan 22, 2010 at 11:07 AM, Collin Winter collinwin...@google.com wrote:
 Hey Jake,

 On Thu, Jan 21, 2010 at 10:48 AM, Jake McGuire mcgu...@google.com wrote:
 On Thu, Jan 21, 2010 at 10:19 AM, Reid Kleckner r...@mit.edu wrote:
 On Thu, Jan 21, 2010 at 12:27 PM, Jake McGuire mcgu...@google.com wrote:
 On Wed, Jan 20, 2010 at 2:27 PM, Collin Winter collinwin...@google.com 
 wrote:
 Profiling
 -

 Unladen Swallow integrates with oProfile 0.9.4 and newer [#oprofile]_ to 
 support
 assembly-level profiling on Linux systems. This means that oProfile will
 correctly symbolize JIT-compiled functions in its reports.

 Do the current python profiling tools (profile/cProfile/pstats) still
 work with Unladen Swallow?

 Sort of.  They disable the use of JITed code, so they don't quite work
 the way you would want them to.  Checking tstate-c_tracefunc every
 line generated too much code.  They still give you a rough idea of
 where your application hotspots are, though, which I think is
 acceptable.

 Hmm.  So cProfile doesn't break, but it causes code to run under a
 completely different execution model so the numbers it produces are
 not connected to reality?

 We've found the call graph and associated execution time information
 from cProfile to be extremely useful for understanding performance
 issues and tracking down regressions.  Giving that up would be a huge
 blow.

 FWIW, cProfile's call graph information is still perfectly accurate,
 but you're right: turning on cProfile does trigger execution under a
 different codepath. That's regrettable, but instrumentation-based
 profiling is always going to introduce skew into your numbers. That's
 why we opted to improve oProfile, since we believe sampling-based
 profiling to be a better model.

Sampling-based may be theoretically better, but we've gotten a lot of
mileage out of profile, hotshot and especially cProfile.  I know that
other people at Google have also used cProfile (backported to 2.4)
with great success.  The couple of times I tried to use oProfile it
was less illuminating than I'd hoped, but that could just be
inexperience.

 Profiling was problematic to support in machine code because in
 Python, you can turn profiling on from user code at arbitrary points.
 To correctly support that, we would need to add lots of hooks to the
 generated code to check whether profiling is enabled, and if so, call
 out to the profiler. Those is profiling enabled now? checks are
 (almost) always going to be false, which means we spend cycles for no
 real benefit.

Well, we put the ability to profile on demand to good use - in
particular by restricting profiling to one particular servlet (or a
subset of servlets) and by skipping the first few executions of that
servlet in a process to avoid startup noise.  All of this gets kicked
off by talking to the management process of our app server via http.

 Can YouTube use oProfile for profiling, or is instrumented profiling
 critical?

[snip]

I don't know that instrumented profiling is critical, but the level of
insight we have now is very important for keeping the our site happy.
It seems like it'd be a fair bit of work to get oProfile to give us
the same level of insight, and it's not clear who would be motivated
to do that work.

 - Add the necessary profiling hooks to JITted code to better support
 cProfile, but add a command-line flag (something explicit like -O3)
 that removes the hooks and activates the current behaviour (or
 something even more restrictive, possibly).

This would be workable albeit suboptimal; as I said we start and stop
profiling on the fly, and while we currently fork a new process to do
this, that's only because we don't have a good arbitrary RPC mechanism
from parent to child.  Having to start up a new python process from
scratch would be a big step back.

 - Initially compile Python code without the hooks, but have a
 trip-wire set to detect the installation of profiling hooks. When
 profiling hooks are installed, purge all machine code from the system
 and recompile all hot functions to include the profiling hooks.

This would be the closest to the way we are doing things now.

If Unladen Swallow is sufficiently faster, we would probably make
oProfile work.  But if it's a marginal improvement, we'd be more
inclined to try for more incremental improvements (e.g. your excellent
cPickle work).

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-21 Thread Jake McGuire

On Wed, Jan 20, 2010 at 2:27 PM, Collin Winter collinwin...@google.com wrote:
 Profiling
 -

 Unladen Swallow integrates with oProfile 0.9.4 and newer [#oprofile]_ to 
 support
 assembly-level profiling on Linux systems. This means that oProfile will
 correctly symbolize JIT-compiled functions in its reports.

Do the current python profiling tools (profile/cProfile/pstats) still
work with Unladen Swallow?

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-21 Thread Jake McGuire

On Thu, Jan 21, 2010 at 10:19 AM, Reid Kleckner r...@mit.edu wrote:
 On Thu, Jan 21, 2010 at 12:27 PM, Jake McGuire mcgu...@google.com wrote:
 On Wed, Jan 20, 2010 at 2:27 PM, Collin Winter collinwin...@google.com 
 wrote:
 Profiling
 -

 Unladen Swallow integrates with oProfile 0.9.4 and newer [#oprofile]_ to 
 support
 assembly-level profiling on Linux systems. This means that oProfile will
 correctly symbolize JIT-compiled functions in its reports.

 Do the current python profiling tools (profile/cProfile/pstats) still
 work with Unladen Swallow?

 Sort of.  They disable the use of JITed code, so they don't quite work
 the way you would want them to.  Checking tstate-c_tracefunc every
 line generated too much code.  They still give you a rough idea of
 where your application hotspots are, though, which I think is
 acceptable.

Hmm.  So cProfile doesn't break, but it causes code to run under a
completely different execution model so the numbers it produces are
not connected to reality?

We've found the call graph and associated execution time information
from cProfile to be extremely useful for understanding performance
issues and tracking down regressions.  Giving that up would be a huge
blow.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-17 Thread Jake McGuire

On Thursday, September 17, 2009, Daniel Fetchinson
fetchin...@googlemail.com wrote:
 188 (check that, 190) people have downloaded the 2.0 release in the
 last week (numbers publicly available from the http://code.google.com). I
 can't tell you how many (if any) have downloaded it via svn.

 Downloading and using are not the same thing.

 Correct, but there is a strong positive correlation between the two.
 If you have a better method for determining what you would consider an
 appropriate level of usage, I'm all ears.

 A good way of determining the level of usage would be pointing to open
 source projects that are popular in the python community and which
 incorporate your module.

 well, the 2.0 release is still new. http://codesearch.google.com shows some
 projects using the 1.x release; hopefully some of those 200
 downloaders will put up some publicly indexable python code at some
 point.

 I think one first needs to wait until this happens, I meana large user
 base is formed, before a meaningful discussion can be done on whether
 to include it in the stdlib or not. The long and largely academic
 thread here I think illustrates this point. Without a large user base
 it's up to anybody's gut feelings what is 'right' and what 'feels
 wrong'.

+1000

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-16 Thread Jake McGuire

On Tue, Sep 15, 2009 at 11:36 AM, Peter Moody pe...@hda3.com wrote:

 On Tue, Sep 15, 2009 at 11:33 AM, Jake McGuire mcgu...@google.com wrote:
  On Tue, Sep 15, 2009 at 10:36 AM, Peter Moody pe...@hda3.com wrote:
 
  On Tue, Sep 15, 2009 at 10:16 AM, Jake McGuire mcgu...@google.com
 wrote:
   On Mon, Sep 14, 2009 at 9:54 AM, Guido van Rossum gu...@python.org
   wrote:
  
   What's the opinion of the other interested party or parties? I don't
   want a repeat of the events last time, where we had to pull it at the
   last time because there hadn't been enough discussion.
  
   How many other people are using this library?  I think it's hard to
 give
   really useful feedback on an API without using it for some non-trivial
   task,
   but maybe other people don't have this problem.
   -jake
 
  188 (check that, 190) people have downloaded the 2.0 release in the
  last week (numbers publicly available from the code.google.com). I
  can't tell you how many (if any) have downloaded it via svn.
 
  Downloading and using are not the same thing.

 Correct, but there is a strong positive correlation between the two.
 If you have a better method for determining what you would consider an
 appropriate level of usage, I'm all ears.


Put something on the project page (or download page if possible) saying
ipaddr is being considered for inclusion in the Python standard library.
 We want to make sure it meets your needs, but we need you to tell us.  If
you use ipaddr and like it, please let us know on ip-addr-dev.

I dunno, maybe it's too much work.  But no one else seems to have an opinion
strong enough to share, at least not at this point.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144 review.

2009-09-15 Thread Jake McGuire

On Mon, Sep 14, 2009 at 9:54 AM, Guido van Rossum gu...@python.org wrote:

 What's the opinion of the other interested party or parties? I don't
 want a repeat of the events last time, where we had to pull it at the
 last time because there hadn't been enough discussion.


How many other people are using this library?  I think it's hard to give
really useful feedback on an API without using it for some non-trivial task,
but maybe other people don't have this problem.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3144: IP Address Manipulation Library for the Python Standard Library

2009-08-25 Thread Jake McGuire

On Mon, Aug 24, 2009 at 9:54 PM, Peter Moodype...@hda3.com wrote:
 I personally hope that's not required; yours has been the only
 dissenting email and I believe I respond to all of your major points
 here.

Silence is not assent.

ipaddr looks like a reasonable library from here, but AFAIK it's not
widely used outside of google.  I don't know if it's reasonable to
want some amount public usage before a brand-new API goes into the
standard library, but such use is more likely to uncover API flaws or
quirks than a PEP.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Google Wave as a developer communication tool

2009-06-04 Thread Jake McGuire

Google Wave is also still in tightly restricted beta.  Gmail went
through a long invite-only period.  Until we have an idea of how long
it will be until basically all python developers who want a Wave
account can get one, it doesn't make sense to talk about using it for
python development, IMO.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issues with process and discussions (Re: Issues with Py3.1's new ipaddr)

2009-06-03 Thread Jake McGuire

On Wed, Jun 3, 2009 at 10:41 AM,  gl...@divmod.com wrote:

 On 02:39 am, gu...@python.org wrote:

 I'm disappointed in the process -- it's as if nobody really reviewed
 the API until it was released with rc1, and this despite there being a
 significant discussion about its inclusion and alternatives months
 ago. (Don't look at me -- I wouldn't recognize a netmask if it bit me
 in the behind, and I can honestly say that I don't know whether /8
 means to look only at the first 8 bits or whether it means to mask off
 the last 8 bits.)

 I hope we can learn from this.

 As he pointed out to Martin, Jean-Paul voiced objections several months ago
 which are similar to the ones which are now being discussed.  To be fair, he
 didn't unambiguously say ... and therefore don't include this library; he
 simply suggested that netaddr was superior in some ways and that perhaps
 some documentation could illuminate why ipaddr was better.

The thing that stands out about the earlier tracker/mailing list
discussions is how very few people affirmatively wanted ipaddr added
to the standard library.  Most people thought it sounded ok in
principle, didn't care, or thought it was not a great idea but didn't
feel like arguing about it.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issues with Py3.1's new ipaddr

2009-06-02 Thread Jake McGuire

On Tue, Jun 2, 2009 at 9:26 AM, Clay McClure c...@daemons.net wrote:
 On Tue, Jun 2, 2009 at 2:08 AM, Martin v. Löwis mar...@v.loewis.de wrote:

 That doesn't solve much. IPv4 objects still always use CIDR notation
 when coerced to strings, meaning that IP addresses will always be
 rendered with a trailing /32.

 That's not true:

 py x = ipaddr.IP(30.40.50.60)
 py print(x.ip_ext_full)
 30.40.50.60

 Thankfully the authors have provided this obscure and strangely-named
 method to get at the correct string representation of an IP address,
 but sadly their __str__ method -- which is the Pythonic way to get
 string representations of objects -- fails in this regard because they
 have only one class representing two distinct concepts.

The minimal demonstration of the problem of representing networks and
addresses using the same class:

 container = [1, 2, 3, 4]
 for item in container:
...   print %s in %s: %s % (item, container, item in container)
...
1 in [1, 2, 3, 4]: True
2 in [1, 2, 3, 4]: True
3 in [1, 2, 3, 4]: True
4 in [1, 2, 3, 4]: True
 import ipaddr
 container = ipaddr.IP('192.168.1.0/24')
 for item in container:
...   print %s in %s: %s % (item, container, item in container)
...
Traceback (most recent call last):
  File stdin, line 2, in module
  File ipaddr.py, line 438, in __contains__
return self.network = other.ip and self.broadcast = other.broadcast
AttributeError: 'str' object has no attribute 'ip'


-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issues with Py3.1's new ipaddr

2009-06-01 Thread Jake McGuire

On Mon, Jun 1, 2009 at 12:16 PM, Martin v. Löwis mar...@v.loewis.dewrote:

 As for Clay McLure's issue: I feel it's primarily a matter of taste.
 I see nothing morally wrong in using the same class for hosts and
 networks, i.e. representing a host as a network of size 1. I can
 understand why people dislike that, but I don't see why it would stop
 people from doing with the library what they want to do.


To the extent that Clay is having issues, it's because ipaddr.py is poorly
documented, has potentially confusing comments, and he became confused.
 Lesser issues are that ipaddr.py doesn't work the way he wants and that ip
addressing is inherently subtle.

Looking at the code in detail shows that ipaddr.IP/IPv4/IPv6 objects always
represent *networks*.  He wants one particular address out of that network,
and that requires using __getitem__ or using IP.ip_ext. Neither is
particularly intuitive.

 import ipaddr
 ip = ipaddr.IPv4('10.33.11.17')
 ip
IPv4('10.33.11.17/32')
 ip[0]
'10.33.11.17'
 ip.ip_ext
'10.33.11.17'


This feels much more like poor documentation than wide-ranging conceptual
flaws.

I could put this in the tracker, but I'm not sure if that's appropriate.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issues with Py3.1's new ipaddr

2009-06-01 Thread Jake McGuire

On Mon, Jun 1, 2009 at 6:54 PM, Jake McGuire mcgu...@google.com wrote:

 On Mon, Jun 1, 2009 at 12:16 PM, Martin v. Löwis mar...@v.loewis.dewrote:

 As for Clay McLure's issue: I feel it's primarily a matter of taste.
 I see nothing morally wrong in using the same class for hosts and
 networks, i.e. representing a host as a network of size 1. I can
 understand why people dislike that, but I don't see why it would stop
 people from doing with the library what they want to do.


 To the extent that Clay is having issues, it's because ipaddr.py is poorly
 documented, has potentially confusing comments, and he became confused.
  Lesser issues are that ipaddr.py doesn't work the way he wants and that ip
 addressing is inherently subtle.


Sorry for the spam, I wrote my last message before reading the entire
discussion surrounding the two libraries and trying to imagine using both
ipaddr and netaddr.

Clay is basically correct.  The ipaddr.py API is missing important features,
and it would probably be a mistake to add it to the python standard library
if that means we'd have to maintain compatibility for the indefinite future.

Like all largeish python projects, we wrote a library to do IP address
manipulation.  In our case, it's a whopping 64 lines long.  While I wasn't
aware of ipaddr.py or netaddr at the time, the API we created is much closer
to netaddr's.  Migrating our code to ipaddr would require significant work
and is unlikely to happen.

In fact, if I was starting a new project from scratch with similar
requirements, I'd probably write my own library instead of using ipaddr.

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jake McGuire


On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote:

Now that you brought up a specific numbers, I tried to verify them,
and found them correct (although a bit unfortunate), please see my
test script below. Up to 21800 interned strings, the dict takes (only)
384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
interned strings is typical, I still don't know.

Wrt. your proposed change, I would be worried about maintainability,
in particular if it would copy parts of the set implementation.



I connected to a random one of our processes, which has been running  
for a typical amount of time and is currently at ~300MB RSS.


(gdb) p *(PyDictObject*)interned
$2 = {ob_refcnt = 1,
  ob_type = 0x8121240,
  ma_fill = 97239,
  ma_used = 95959,
  ma_mask = 262143,
  ma_table = 0xa493c008,
  }

Going from 3MB to 2.25MB isn't much, but it's not nothing, either.

I'd be skeptical of cache performance arguments given that the strings  
used in any particular bit of code should be spread pretty much evenly  
throughout the hash table, and 3MB seems solidly bigger than any L2  
cache I know of.  You should be able to get meaningful numbers out of  
a C profiler, but I'd be surprised to see the act of interning taking  
a noticeable amount of time.


-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] undesireable unpickle behavior, proposed fix

2009-01-27 Thread Jake McGuire

Instance attribute names are normally interned - this is done in  
PyObject_SetAttr (among other places).  Unpickling (in pickle and  
cPickle) directly updates __dict__ on the instance object.  This  
bypasses the interning so you end up with many copies of the strings  
representing your attribute names, which wastes a lot of space, both  
in RAM and in pickles of sequences of objects created from pickles.   
Note that the native python memcached client uses pickle to serialize  
objects.


 import pickle
 class C(object):
...   def __init__(self, x):
... self.long_attribute_name = x
...
 len(pickle.dumps([pickle.loads(pickle.dumps(C(None),  
pickle.HIGHEST_PROTOCOL)) for i in range(100)],  
pickle.HIGHEST_PROTOCOL))

3658
 len(pickle.dumps([C(None) for i in range(100)],  
pickle.HIGHEST_PROTOCOL))

1441


Interning the strings on unpickling makes the pickles smaller, and at  
least for cPickle actually makes unpickling sequences of many objects  
slightly faster.  I have included proposed patches to cPickle.c and  
pickle.py, and would appreciate any feedback.


dhcp-172-31-170-32:~ mcguire$ diff -u Downloads/Python-2.4.3/Modules/ 
cPickle.c cPickle.c
--- Downloads/Python-2.4.3/Modules/cPickle.c	2004-07-26  
22:22:33.0 -0700

+++ cPickle.c   2009-01-26 23:30:31.0 -0800
@@ -4258,6 +4258,8 @@
PyObject *state, *inst, *slotstate;
PyObject *__setstate__;
PyObject *d_key, *d_value;
+   PyObject *name;
+   char * key_str;
int i;
int res = -1;

@@ -4319,8 +4321,24 @@

i = 0;
while (PyDict_Next(state, i, d_key, d_value)) {
-   if (PyObject_SetItem(dict, d_key, d_value)  0)
-   goto finally;
+   /* normally the keys for instance attributes are
+  interned.  we should try to do that here. */
+   if (PyString_CheckExact(d_key)) {
+   key_str = PyString_AsString(d_key);
+   name = PyString_FromString(key_str);
+   if (! name)
+   goto finally;
+
+   PyString_InternInPlace(name);
+   if (PyObject_SetItem(dict, name, d_value)  0) {
+   Py_DECREF(name);
+   goto finally;
+   }
+   Py_DECREF(name);
+   } else {
+   if (PyObject_SetItem(dict, d_key, d_value)  0)
+   goto finally;
+   }
}
Py_DECREF(dict);
}

dhcp-172-31-170-32:~ mcguire$ diff -u Downloads/Python-2.4.3/Lib/ 
pickle.py pickle.py
--- Downloads/Python-2.4.3/Lib/pickle.py	2009-01-27 01:41:43.0  
-0800

+++ pickle.py   2009-01-27 01:41:31.0 -0800
@@ -1241,7 +1241,15 @@
 state, slotstate = state
 if state:
 try:
-inst.__dict__.update(state)
+d = inst.__dict__
+try:
+for k,v in state.items():
+d[intern(k)] = v
+# keys in state don't have to be strings
+# don't blow up, but don't go out of our way
+except TypeError:
+d.update(state)
+
 except RuntimeError:
 # XXX In restricted execution, the instance's __dict__
 # is not accessible.  Use the old way of unpickling

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] undesireable unpickle behavior, proposed fix

2009-01-27 Thread Jake McGuire


On Jan 27, 2009, at 11:40 AM, Martin v. Löwis wrote:

Hm. This would change the pickling format though. Wouldn't just
interning (short) strings on unpickling be simpler?


Sure - that's what Jake had proposed. However, it is always difficult
to select which strings to intern - his heuristics (IIUC) is to intern
all strings that appear as dictionary keys. Whether this is good  
enough,

I don't know. In particular, it might intern very large strings that
aren't identifiers at all.


I may have misunderstood how unpickling works, but I believe that my  
path only interns strings that are keys in a dictionary used to  
populate an instance.  This is very similar to how instance creation  
and modification works in Python now.  The only difference is if you  
set an attribute via inst.__dict__['attribute_name'] = value then  
'attribute_name' will not be automatically interned, but if you pickle  
the instance, 'attribute_name' will be interned on unpickling.


There may be cases where users specifically go through __dict__ to  
avoid interning attribute names, but I would be surprised to hear  
about it and very interested in talking to the person who did that.


Creating a new pickle protocol to handle this case seems excessive...

-jake 
___

Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] undesireable unpickle behavior, proposed fix

2009-01-27 Thread Jake McGuire


On Jan 27, 2009, at 12:39 PM, Martin v. Löwis wrote:

I may have misunderstood how unpickling works


Perhaps I have misunderstood your patch. Posting it to Rietveld might
also be useful.


It is not immediately clear to me how Rietveld works.  But I have  
created an issue on tracker:


http://bugs.python.org/issue5084

Another vaguely related change would be to store string and unicode  
objects in the pickler memo keyed as themselves rather than their  
object ids.  Depending on the data set, you can have many copies of  
the same string, e.g. application/octet-stream.  This may marginally  
increase memory usage during pickling, depending on the data being  
pickled and the way in which the code was written.


I'm happy to write this up if people are interested...

-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] What to intern (e.g. func_code.co_filename)?

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

Re: [Python-Dev] PEP 3144 review.

Re: [Python-Dev] PEP 3144 review.

Re: [Python-Dev] PEP 3144 review.

Re: [Python-Dev] PEP 3144: IP Address Manipulation Library for the Python Standard Library

Re: [Python-Dev] Google Wave as a developer communication tool

Re: [Python-Dev] Issues with process and discussions (Re: Issues with Py3.1's new ipaddr)

Re: [Python-Dev] Issues with Py3.1's new ipaddr

Re: [Python-Dev] Issues with Py3.1's new ipaddr

Re: [Python-Dev] Issues with Py3.1's new ipaddr

Re: [Python-Dev] Rethinking intern() and its data structure

[Python-Dev] undesireable unpickle behavior, proposed fix

Re: [Python-Dev] undesireable unpickle behavior, proposed fix

Re: [Python-Dev] undesireable unpickle behavior, proposed fix

17 matches

Site Navigation

Mail list logo

Footer information