Hey, Andi

We can almost certainly replace a lot of the initialValue uses with defaultValue, especially for values that aren't references. There's some risk there, because the return value Item.hasLocalAttributeValue () is different in the two cases, and some code (ICalendar/sharing, maybe) could rely on that.

So far as reflists go, there are really two approaches in use right now:

1) Use defaultValue=None in your attribute.

(Example: EventStamp.modifications or EventStamp.occurrences)

Then, always make sure you can deal with None when iterating over myAttr. If you have a biref, let the repository take care of setting up the biref where possible, by always assigning to the "other" end. This avoids a lot of checking for None, and initializing the attribute to [] (or set() or {} as appropriate).

2) Use initialValue=[]

(Example: Reminder.reminderItems)

As you noticed, this leads to an extra attribute assignment every time you create the Item. It does, however, make for simpler code: you assume that attribute always exists, and just iterate over it, call add()/remove()/whatever on it, and all's well. Unfortunately, it also has the downside that initialValues don't always get set (if items are imported via sharing, __init__ and initialValues are bypassed), which leads to bugs.

(As an aside, there's a movement afoot -- mainly PJE's work -- to have initialValues in Stamp subclasses only be set up when the stamps are actually added. This would probably cut down on some of the empty collection values in your report. cf.

<http://bugzilla.osafoundation.org/show_bug.cgi?id=7322>)


Anyway, is there a way to have the best of 1) and 2)? i.e. have something that behaved like defaultValue=[], until you tried to modify the collection? i.e. in code:

>>> class MyItem(schema.Item):
...     myAttr = schema.Sequence(schema.Item, magicValue=[])
...
>>> x = MyItem(...)
>>> list(x.myAttr)
[]
>>> x.hasLocalAttributeValue('myAttr')
False
>>> x.myAttr.add(x)
>>> x.hasLocalAttributeValue('myAttr')
True
>>> x.myAttr.first()
<MyItem ....>

In fact, shouldn't all many-valued attributes behave this way?

--Grant

On 18 Jan, 2007, at 17:49, Andi Vajda wrote:


A while ago I noticed that the Chandler 'Welcome Note' had 49 values out of the box. That number seemed a little high to me so I looked into this issue
a little more. Below is what I found so far.

Using Katie's alpha4.ini file which restores the collections she's normally using during dogfooding Chandler I end up with a repository containing 1816 ContentItem instances. This includes the office calendar.

   >>> l=list(ContentItem.iterItems(view))
   >>> len(l)
   1816

These 1816 instances contain a total of 58856 values or references, that is, 58856 named entries in their _values and _references dictionaries.

   >>> sum(len(i._values) + len(i._references) for i in l)
   58856

More precisely, 32274 literal values and
                21852 references (bi-refs, ref collections, None, ...)

Focusing on literal values, how many false values, ie, values that are None, False, empty lists, empty dicts, etc... are there:

>>> sum(sum(1 for v in i._values._dict.itervalues() if not v) for i in l)
   17944

Hmm, that's a lot of false values. 56% !
How many of these are empty dicts or empty lists:

   >>> sum(sum(1 for v in i._values._dict.itervalues()
               if not v and isinstance(v, dict)) for i in l)
   2848

   >>> sum(sum(1 for v in i._values._dict.itervalues()
               if not v and isinstance(v, list)) for i in l)
   1043

A lot of empty dicts it seems.
Digging further and getting help from a little count function:

    def count(d, s):
        for n in s:
            if n in d:
                d[n] += 1
            else:
                d[n] = 1

Then, I was interested in seeing which attributes and how many occurrences
of them had empty dicts:

d = {}
for a in ((n for n,v in i._values._dict.iteritems()
               if not v and isinstance(v, dict)) for i in l):
        count(d,a)
for n,c in d.iteritems():
        print "%50s: %4d" %(n, c)

  osaf.pim.calendar.EventStamp.icalendarParameters:  450
                             downloadedMessageUIDS:  2
                                          manifest:  4
                   osaf.pim.mail.MailStamp.headers:  798
  osaf.pim.calendar.EventStamp.icalendarProperties:  796
           osaf.pim.mail.MailStamp.chandlerHeaders:  798

Indeed, a small number of attributes are setup with empty dicts but these add up. Would there be a way to not do that ? Using a defaultValue is not going to work as a defaultValue is a schema value that is shared by all attributes needing it. Using a mutable value as a defaultValue is not good.

Similarly, for empty lists, we have:

                                      messageQueue:  3
                                     filterClasses:  8
                                           exdates:  61
             osaf.pim.mail.MailStamp.referencesMID:  798
                                            rdates:  68
                                        bymonthday:  57
                                          invitees:  48

Looks like at least one candidate for some rethinking...

Looking at simpler values, such as True or False, easy to use with
defaultValue since they're immutable, it looks like we have lots of
attributes with a local False value:

                                          isActive:    3
              osaf.pim.calendar.EventStamp.anyTime:  451
                                         recursive:    1
                                            useSSL:   15
        osaf.usercollections.UserCollection.canAdd:    1
                                              read: 1815
                                              test:    7
                                       untilIsDate:   68
                      osaf.pim.mail.MailStamp.toMe:  798
                                           private: 1816
                                           useAuth:    2
  osaf.usercollections.UserCollection.allowOverlay:    4
  osaf.usercollections.UserCollection.colorizeIcon:    4
          osaf.pim.calendar.EventStamp.isGenerated:   15
    osaf.usercollections.UserCollection.renameable:    4
                                        needsReply: 1816
               osaf.pim.calendar.EventStamp.allDay:  456
                    osaf.pim.mail.MailStamp.fromMe:  798
                                            hidden:    8
                osaf.pim.mail.MailStamp.isOutbound:  798


Similarly, for True, we have:

   osaf.usercollections.UserCollection.dontDisplayAsCalendar:    7
                                                 established:    8
                        osaf.pim.calendar.EventStamp.anyTime:  347
                                                   recursive:    8
   osaf.usercollections.UserCollection.outOfTheBoxCollection:    4
                                                        test:    1
                                                     useAuth:    1
                                                        mine: 1816
                         osaf.pim.calendar.EventStamp.allDay:  341
                                               leaveOnServer:    2
                                                      useSSL:    4
                                                        read:    1
                    osaf.pim.calendar.EventStamp.isGenerated:  262
 osaf.usercollections.UserCollection.iconNameHasClassVariant:    1
                                                      active:    8
                                                    isActive:    7

How about making 'mine' be True by default ?

Now, looking at the number of values and references per ContentItem
instance, it seems that they have at least 15 and at the most 54.

   >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
   >>> m.sort()
   >>> m[0]
   (15, <UUID: cf01b286-a753-11db-b1d2-9e1578b66e66>)
   >>> m[-1]
   (54, <UUID: 05b7c14e-a754-11db-b1d3-9e1578b66e66>)

Breaking it up by number of items for a given number of values, I get:

   >>> from itertools import groupby
   >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
   >>> m.sort()
   >>> [(n, len(list(g))) for n, g in groupby(m, lambda x: x[0])
[(15, 5), (16, 65), (17, 7), (18, 14), (19, 634), (20, 35), (21, 173),
    (22, 18), (23, 21), (24, 6), (25, 1), (26, 5), (27, 4), (28, 4),
    (29, 9), (30, 11), (31, 3), (32, 2), (37, 1), (39, 1), (46, 300),
(47, 152), (48, 26), (49, 35), (50, 43), (51, 21), (52, 55), (53, 129),
    (54, 36)]

It looks like most ContentItem instances have at least 46 values !

Reducing these value counts can speed up many things such as item commit or
item load.

Andi..
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to