Re: [Chandler-dev] empty value report (perf)

Grant Baillie Fri, 19 Jan 2007 10:43:58 -0800

Hey, Andi

We can almost certainly replace a lot of the initialValue uses withdefaultValue, especially for values that aren't references. There'ssome risk there, because the return value Item.hasLocalAttributeValue() is different in the two cases, and some code (ICalendar/sharing,maybe) could rely on that.


So far as reflists go, there are really two approaches in use right now:

1) Use defaultValue=None in your attribute.

(Example: EventStamp.modifications or EventStamp.occurrences)

Then, always make sure you can deal with None when iterating overmyAttr. If you have a biref, let the repository take care of settingup the biref where possible, by always assigning to the "other" end.This avoids a lot of checking for None, and initializing theattribute to [] (or set() or {} as appropriate).


2) Use initialValue=[]

(Example: Reminder.reminderItems)

As you noticed, this leads to an extra attribute assignment everytime you create the Item. It does, however, make for simpler code:you assume that attribute always exists, and just iterate over it,call add()/remove()/whatever on it, and all's well. Unfortunately, italso has the downside that initialValues don't always get set (ifitems are imported via sharing, __init__ and initialValues arebypassed), which leads to bugs.

(As an aside, there's a movement afoot -- mainly PJE's work -- tohave initialValues in Stamp subclasses only be set up when the stampsare actually added. This would probably cut down on some of the emptycollection values in your report. cf.


<http://bugzilla.osafoundation.org/show_bug.cgi?id=7322>)

Anyway, is there a way to have the best of 1) and 2)? i.e. havesomething that behaved like defaultValue=[], until you tried tomodify the collection? i.e. in code:


>>> class MyItem(schema.Item):
...     myAttr = schema.Sequence(schema.Item, magicValue=[])
...
>>> x = MyItem(...)
>>> list(x.myAttr)
[]
>>> x.hasLocalAttributeValue('myAttr')
False
>>> x.myAttr.add(x)
>>> x.hasLocalAttributeValue('myAttr')
True
>>> x.myAttr.first()
<MyItem ....>

In fact, shouldn't all many-valued attributes behave this way?

--Grant

On 18 Jan, 2007, at 17:49, Andi Vajda wrote:

A while ago I noticed that the Chandler 'Welcome Note' had 49values out ofthe box. That number seemed a little high to me so I looked intothis issue

a little more. Below is what I found so far.

Using Katie's alpha4.ini file which restores the collections she'snormally using during dogfooding Chandler I end up with arepository containing 1816 ContentItem instances. This includes theoffice calendar.


   >>> l=list(ContentItem.iterItems(view))
   >>> len(l)
   1816

These 1816 instances contain a total of 58856 values or references,that is, 58856 named entries in their _values and _referencesdictionaries.


   >>> sum(len(i._values) + len(i._references) for i in l)
   58856

More precisely, 32274 literal values and
                21852 references (bi-refs, ref collections, None, ...)

Focusing on literal values, how many false values, ie, values thatare None, False, empty lists, empty dicts, etc... are there:

>>> sum(sum(1 for v in i._values._dict.itervalues() if not v)for i in l)

   17944

Hmm, that's a lot of false values. 56% !
How many of these are empty dicts or empty lists:

   >>> sum(sum(1 for v in i._values._dict.itervalues()
               if not v and isinstance(v, dict)) for i in l)
   2848

   >>> sum(sum(1 for v in i._values._dict.itervalues()
               if not v and isinstance(v, list)) for i in l)
   1043

A lot of empty dicts it seems.
Digging further and getting help from a little count function:

    def count(d, s):
        for n in s:
            if n in d:
                d[n] += 1
            else:
                d[n] = 1

Then, I was interested in seeing which attributes and how manyoccurrences

of them had empty dicts:

d = {}
for a in ((n for n,v in i._values._dict.iteritems()

               if not v and isinstance(v, dict)) for i in l):
        count(d,a)

for n,c in d.iteritems():

        print "%50s: %4d" %(n, c)

  osaf.pim.calendar.EventStamp.icalendarParameters:  450
                             downloadedMessageUIDS:  2
                                          manifest:  4
                   osaf.pim.mail.MailStamp.headers:  798
  osaf.pim.calendar.EventStamp.icalendarProperties:  796
           osaf.pim.mail.MailStamp.chandlerHeaders:  798

Indeed, a small number of attributes are setup with empty dicts butthese add up. Would there be a way to not do that ? Using adefaultValue is not going to work as a defaultValue is a schemavalue that is shared by all attributes needing it. Using a mutablevalue as a defaultValue is not good.


Similarly, for empty lists, we have:

                                      messageQueue:  3
                                     filterClasses:  8
                                           exdates:  61
             osaf.pim.mail.MailStamp.referencesMID:  798
                                            rdates:  68
                                        bymonthday:  57
                                          invitees:  48

Looks like at least one candidate for some rethinking...

Looking at simpler values, such as True or False, easy to use with
defaultValue since they're immutable, it looks like we have lots of
attributes with a local False value:

                                          isActive:    3
              osaf.pim.calendar.EventStamp.anyTime:  451
                                         recursive:    1
                                            useSSL:   15
        osaf.usercollections.UserCollection.canAdd:    1
                                              read: 1815
                                              test:    7
                                       untilIsDate:   68
                      osaf.pim.mail.MailStamp.toMe:  798
                                           private: 1816
                                           useAuth:    2
  osaf.usercollections.UserCollection.allowOverlay:    4
  osaf.usercollections.UserCollection.colorizeIcon:    4
          osaf.pim.calendar.EventStamp.isGenerated:   15
    osaf.usercollections.UserCollection.renameable:    4
                                        needsReply: 1816
               osaf.pim.calendar.EventStamp.allDay:  456
                    osaf.pim.mail.MailStamp.fromMe:  798
                                            hidden:    8
                osaf.pim.mail.MailStamp.isOutbound:  798


Similarly, for True, we have:

   osaf.usercollections.UserCollection.dontDisplayAsCalendar:    7
                                                 established:    8
                        osaf.pim.calendar.EventStamp.anyTime:  347
                                                   recursive:    8
   osaf.usercollections.UserCollection.outOfTheBoxCollection:    4
                                                        test:    1
                                                     useAuth:    1
                                                        mine: 1816
                         osaf.pim.calendar.EventStamp.allDay:  341
                                               leaveOnServer:    2
                                                      useSSL:    4
                                                        read:    1
                    osaf.pim.calendar.EventStamp.isGenerated:  262
 osaf.usercollections.UserCollection.iconNameHasClassVariant:    1
                                                      active:    8
                                                    isActive:    7

How about making 'mine' be True by default ?

Now, looking at the number of values and references per ContentItem
instance, it seems that they have at least 15 and at the most 54.

   >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
   >>> m.sort()
   >>> m[0]
   (15, <UUID: cf01b286-a753-11db-b1d2-9e1578b66e66>)
   >>> m[-1]
   (54, <UUID: 05b7c14e-a754-11db-b1d3-9e1578b66e66>)

Breaking it up by number of items for a given number of values, I get:

   >>> from itertools import groupby
   >>> m=[(len(i._values) + len(i._references), i.itsUUID) for i in l]
   >>> m.sort()
   >>> [(n, len(list(g))) for n, g in groupby(m, lambda x: x[0])

[(15, 5), (16, 65), (17, 7), (18, 14), (19, 634), (20, 35), (21,173),

    (22, 18), (23, 21), (24, 6), (25, 1), (26, 5), (27, 4), (28, 4),
    (29, 9), (30, 11), (31, 3), (32, 2), (37, 1), (39, 1), (46, 300),

(47, 152), (48, 26), (49, 35), (50, 43), (51, 21), (52, 55),(53, 129),

    (54, 36)]

It looks like most ContentItem instances have at least 46 values !

Reducing these value counts can speed up many things such as itemcommit or

item load.

Andi..
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Re: [Chandler-dev] empty value report (perf)

Reply via email to