Re: [Design] Determining that collections behave like categories

Philippe Bossut Mon, 30 Jan 2006 17:36:57 -0800

Hi,

Alec Flett wrote:

I personally wish that I could just have a delicious-like interfacefor all my mail - I mean ultimately, I don't need to see all my mailfolders in my sidebar - I need some subset of collections that give mequick access to important messages.
Part of that dream is that someone has gone and already tagged mythousands of e-mail messages with relevant tags, and so forth. But Ialways wondered - say we fast forward some number of months/years towhen chandler is usable as a mail client and we have this world oftags and collections, how does my mail get from hierarchical mailfolders to this neat folksonomy?
One thought I had is that there is actually a lot of meta-informationstored in mail that we could sort of auto-create collections from...essentially, what if chandler sort of auto-tagged your e-mail and thenalso made some special collections based on data collected during theauto-tagging?
For instance:
1) Folder name. I have some folders in hierarchies (I'm using thedelimiter '.' as we do on the OSAF server), "MailingLists.dev" "bugs","Sent", "Sent.2005" and so forth. I think we could more or less tageach message with all the components of the path. For instance amessage in "MailingLists.dev" would get tagged with "MailingLists" and"dev" automatically. Messages in "bugs" would get tagged as "bugs". Wemay be able to develop heuristics for making a good guess at whichtags to create collections for in the sidebar. For instance, if allmessages in the "bugs" folder are also from"[EMAIL PROTECTED]" then maybe that tells us somethingelse about what the "bugs" tag means.
2) Sender/Recipients. We wouldn't need to tag anything based on this,but we could auto-create collections for the "top people" - i.e. Ihave a few specific friends who I probably exchange at least 30% of mypersonal e-mail with. We could create collections for these "toppeople" - I think we could probably come up with some pretty decentheuristics for determining who those people are.
3) Mailing lists - there are multiple ways to identify mailing lists,from well-known RFC822 headers, to looking for [xxxxx] in subjectlines, to noticing that MY e-mail address doesn't appear in the To orCC headers. We could again develop some heuristics to filter out thenoise so that if I have 1000 messages with [....] in the subject line,but only 2 of them have [ACLFW] then I'm probably not on a mailinglist called ACLFW and might just have some messages from them, or afriend used [ACLFW] in a subject line when sending a message.
4) Date - we could easily auto-create some collections for "more than1 year old" or "last month" - because honestly, I don't care where myreally old messages are, as long as I can find them by searching.
Anyway, you get the idea. I really like the idea of at least givingthe user a "best guess" when transitioning from another mail systeminto chandler. The same principle could apply for importing calendarsfrom iCal, RSS feeds from BlogLines, or whatever.

Those are all great ideas. There are indeed a bunch of low hangingfruits as far as auto tagging goes. The decomposition of the IMAPhierarchy into a set of non-hierarchical tags is right on IMO.

Note that having tags does not mean that you have to go away withdisplaying a hierarchy altogether. When displaying a graph (a tree beinga subclass of a general graph), you can imagine pulling on one verticeand, using appropriate heuristics based on weighting the edges of thegraph, identify a subtree in its vicinity. That means of course that wecan identify a graph structure through the data and that's wheresegmentation algorithms would help.

On auto tagging now, there are statistical techniques to segment and mapbig data sets that do not require some hocus pocus semanticunderstanding of text. For tags created by users, one can imagine thatonce a significant seed set has been tagged by the user, the system candetect patterns automatically and tag as appropriate.


Without going into the details, things would look like:

- segment data (big algo running on the whole data set from time totime) and identify groups and structure of groups

- map tags to groups

- choose root vertice (automatically or based on some context given bythe user)

- prune local graph to identify small shallow tree
- display local tree (hierarchical or not at user's choice)

For new data trickling in:

- project and map data on the group structure (i.e. don't run theanalysis for each email coming in...)- tags the new data according to where it falls (one can attach somebayesian value to it that can be used to reduce the amount of falsepositive)

OK, I went super fast (each point here above would deserve a substantialexplanation) because it's really a Chandler v2.0 idea but the point isthat auto tagging is indeed important and that there are techniques outthere that can be used to do just that that would not require deepsemantic analysis of free form text.


Cheers,
- Philippe

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Design" mailing list
http://lists.osafoundation.org/mailman/listinfo/design

Re: [Design] Determining that collections behave like categories

Reply via email to