Hey, AndiWe can almost certainly replace a lot of the initialValue uses with defaultValue, especially for values that aren't references. There's some risk there, because the return value Item.hasLocalAttributeValue () is different in the two cases, and some code (ICalendar/sharing, maybe) could rely on that.
So far as reflists go, there are really two approaches in use right now: 1) Use defaultValue=None in your attribute. (Example: EventStamp.modifications or EventStamp.occurrences)Then, always make sure you can deal with None when iterating over myAttr. If you have a biref, let the repository take care of setting up the biref where possible, by always assigning to the "other" end. This avoids a lot of checking for None, and initializing the attribute to [] (or set() or {} as appropriate).
2) Use initialValue=[] (Example: Reminder.reminderItems)As you noticed, this leads to an extra attribute assignment every time you create the Item. It does, however, make for simpler code: you assume that attribute always exists, and just iterate over it, call add()/remove()/whatever on it, and all's well. Unfortunately, it also has the downside that initialValues don't always get set (if items are imported via sharing, __init__ and initialValues are bypassed), which leads to bugs.
(As an aside, there's a movement afoot -- mainly PJE's work -- to have initialValues in Stamp subclasses only be set up when the stamps are actually added. This would probably cut down on some of the empty collection values in your report. cf.
<http://bugzilla.osafoundation.org/show_bug.cgi?id=7322>)Anyway, is there a way to have the best of 1) and 2)? i.e. have something that behaved like defaultValue=[], until you tried to modify the collection? i.e. in code:
>>> class MyItem(schema.Item):
... myAttr = schema.Sequence(schema.Item, magicValue=[])
...
>>> x = MyItem(...)
>>> list(x.myAttr)
[]
>>> x.hasLocalAttributeValue('myAttr')
False
>>> x.myAttr.add(x)
>>> x.hasLocalAttributeValue('myAttr')
True
>>> x.myAttr.first()
<MyItem ....>
In fact, shouldn't all many-valued attributes behave this way?
--Grant
On 18 Jan, 2007, at 17:49, Andi Vajda wrote:
A while ago I noticed that the Chandler 'Welcome Note' had 49 values out of the box. That number seemed a little high to me so I looked into this issuea little more. Below is what I found so far.Using Katie's alpha4.ini file which restores the collections she's normally using during dogfooding Chandler I end up with a repository containing 1816 ContentItem instances. This includes the office calendar.>>> l=list(ContentItem.iterItems(view)) >>> len(l) 1816These 1816 instances contain a total of 58856 values or references, that is, 58856 named entries in their _values and _references dictionaries.>>> sum(len(i._values) + len(i._references) for i in l) 58856 More precisely, 32274 literal values and 21852 references (bi-refs, ref collections, None, ...)Focusing on literal values, how many false values, ie, values that are None, False, empty lists, empty dicts, etc... are there:>>> sum(sum(1 for v in i._values._dict.itervalues() if not v) for i in l)17944 Hmm, that's a lot of false values. 56% ! How many of these are empty dicts or empty lists: >>> sum(sum(1 for v in i._values._dict.itervalues() if not v and isinstance(v, dict)) for i in l) 2848 >>> sum(sum(1 for v in i._values._dict.itervalues() if not v and isinstance(v, list)) for i in l) 1043 A lot of empty dicts it seems. Digging further and getting help from a little count function: def count(d, s): for n in s: if n in d: d[n] += 1 else: d[n] = 1Then, I was interested in seeing which attributes and how many occurrencesof them had empty dicts:d = {} for a in ((n for n,v in i._values._dict.iteritems()if not v and isinstance(v, dict)) for i in l): count(d,a)for n,c in d.iteritems():print "%50s: %4d" %(n, c) osaf.pim.calendar.EventStamp.icalendarParameters: 450 downloadedMessageUIDS: 2 manifest: 4 osaf.pim.mail.MailStamp.headers: 798 osaf.pim.calendar.EventStamp.icalendarProperties: 796 osaf.pim.mail.MailStamp.chandlerHeaders: 798Indeed, a small number of attributes are setup with empty dicts but these add up. Would there be a way to not do that ? Using a defaultValue is not going to work as a defaultValue is a schema value that is shared by all attributes needing it. Using a mutable value as a defaultValue is not good.Similarly, for empty lists, we have: messageQueue: 3 filterClasses: 8 exdates: 61 osaf.pim.mail.MailStamp.referencesMID: 798 rdates: 68 bymonthday: 57 invitees: 48 Looks like at least one candidate for some rethinking... Looking at simpler values, such as True or False, easy to use with defaultValue since they're immutable, it looks like we have lots of attributes with a local False value: isActive: 3 osaf.pim.calendar.EventStamp.anyTime: 451 recursive: 1 useSSL: 15 osaf.usercollections.UserCollection.canAdd: 1 read: 1815 test: 7 untilIsDate: 68 osaf.pim.mail.MailStamp.toMe: 798 private: 1816 useAuth: 2 osaf.usercollections.UserCollection.allowOverlay: 4 osaf.usercollections.UserCollection.colorizeIcon: 4 osaf.pim.calendar.EventStamp.isGenerated: 15 osaf.usercollections.UserCollection.renameable: 4 needsReply: 1816 osaf.pim.calendar.EventStamp.allDay: 456 osaf.pim.mail.MailStamp.fromMe: 798 hidden: 8 osaf.pim.mail.MailStamp.isOutbound: 798 Similarly, for True, we have: osaf.usercollections.UserCollection.dontDisplayAsCalendar: 7 established: 8 osaf.pim.calendar.EventStamp.anyTime: 347 recursive: 8 osaf.usercollections.UserCollection.outOfTheBoxCollection: 4 test: 1 useAuth: 1 mine: 1816 osaf.pim.calendar.EventStamp.allDay: 341 leaveOnServer: 2 useSSL: 4 read: 1 osaf.pim.calendar.EventStamp.isGenerated: 262 osaf.usercollections.UserCollection.iconNameHasClassVariant: 1 active: 8 isActive: 7 How about making 'mine' be True by default ? Now, looking at the number of values and references per ContentItem instance, it seems that they have at least 15 and at the most 54. >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l] >>> m.sort() >>> m[0] (15, <UUID: cf01b286-a753-11db-b1d2-9e1578b66e66>) >>> m[-1] (54, <UUID: 05b7c14e-a754-11db-b1d3-9e1578b66e66>) Breaking it up by number of items for a given number of values, I get: >>> from itertools import groupby >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l] >>> m.sort() >>> [(n, len(list(g))) for n, g in groupby(m, lambda x: x[0])[(15, 5), (16, 65), (17, 7), (18, 14), (19, 634), (20, 35), (21, 173),(22, 18), (23, 21), (24, 6), (25, 1), (26, 5), (27, 4), (28, 4), (29, 9), (30, 11), (31, 3), (32, 2), (37, 1), (39, 1), (46, 300),(47, 152), (48, 26), (49, 35), (50, 43), (51, 21), (52, 55), (53, 129),(54, 36)] It looks like most ContentItem instances have at least 46 values !Reducing these value counts can speed up many things such as item commit oritem load. Andi.. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Open Source Applications Foundation "chandler-dev" mailing list http://lists.osafoundation.org/mailman/listinfo/chandler-dev
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Open Source Applications Foundation "chandler-dev" mailing list http://lists.osafoundation.org/mailman/listinfo/chandler-dev
