Hi,

Alec Flett wrote:

I personally wish that I could just have a delicious-like interface for all my mail - I mean ultimately, I don't need to see all my mail folders in my sidebar - I need some subset of collections that give me quick access to important messages.

Part of that dream is that someone has gone and already tagged my thousands of e-mail messages with relevant tags, and so forth. But I always wondered - say we fast forward some number of months/years to when chandler is usable as a mail client and we have this world of tags and collections, how does my mail get from hierarchical mail folders to this neat folksonomy?

One thought I had is that there is actually a lot of meta-information stored in mail that we could sort of auto-create collections from... essentially, what if chandler sort of auto-tagged your e-mail and then also made some special collections based on data collected during the auto-tagging?

For instance:
1) Folder name. I have some folders in hierarchies (I'm using the delimiter '.' as we do on the OSAF server), "MailingLists.dev" "bugs", "Sent", "Sent.2005" and so forth. I think we could more or less tag each message with all the components of the path. For instance a message in "MailingLists.dev" would get tagged with "MailingLists" and "dev" automatically. Messages in "bugs" would get tagged as "bugs". We may be able to develop heuristics for making a good guess at which tags to create collections for in the sidebar. For instance, if all messages in the "bugs" folder are also from "[EMAIL PROTECTED]" then maybe that tells us something else about what the "bugs" tag means.

2) Sender/Recipients. We wouldn't need to tag anything based on this, but we could auto-create collections for the "top people" - i.e. I have a few specific friends who I probably exchange at least 30% of my personal e-mail with. We could create collections for these "top people" - I think we could probably come up with some pretty decent heuristics for determining who those people are.

3) Mailing lists - there are multiple ways to identify mailing lists, from well-known RFC822 headers, to looking for [xxxxx] in subject lines, to noticing that MY e-mail address doesn't appear in the To or CC headers. We could again develop some heuristics to filter out the noise so that if I have 1000 messages with [....] in the subject line, but only 2 of them have [ACLFW] then I'm probably not on a mailing list called ACLFW and might just have some messages from them, or a friend used [ACLFW] in a subject line when sending a message.

4) Date - we could easily auto-create some collections for "more than 1 year old" or "last month" - because honestly, I don't care where my really old messages are, as long as I can find them by searching.

Anyway, you get the idea. I really like the idea of at least giving the user a "best guess" when transitioning from another mail system into chandler. The same principle could apply for importing calendars from iCal, RSS feeds from BlogLines, or whatever.

Those are all great ideas. There are indeed a bunch of low hanging fruits as far as auto tagging goes. The decomposition of the IMAP hierarchy into a set of non-hierarchical tags is right on IMO.

Note that having tags does not mean that you have to go away with displaying a hierarchy altogether. When displaying a graph (a tree being a subclass of a general graph), you can imagine pulling on one vertice and, using appropriate heuristics based on weighting the edges of the graph, identify a subtree in its vicinity. That means of course that we can identify a graph structure through the data and that's where segmentation algorithms would help.

On auto tagging now, there are statistical techniques to segment and map big data sets that do not require some hocus pocus semantic understanding of text. For tags created by users, one can imagine that once a significant seed set has been tagged by the user, the system can detect patterns automatically and tag as appropriate.

Without going into the details, things would look like:
- segment data (big algo running on the whole data set from time to time) and identify groups and structure of groups
- map tags to groups
- choose root vertice (automatically or based on some context given by the user)
- prune local graph to identify small shallow tree
- display local tree (hierarchical or not at user's choice)

For new data trickling in:
- project and map data on the group structure (i.e. don't run the analysis for each email coming in...) - tags the new data according to where it falls (one can attach some bayesian value to it that can be used to reduce the amount of false positive)

OK, I went super fast (each point here above would deserve a substantial explanation) because it's really a Chandler v2.0 idea but the point is that auto tagging is indeed important and that there are techniques out there that can be used to do just that that would not require deep semantic analysis of free form text.

Cheers,
- Philippe

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Design" mailing list
http://lists.osafoundation.org/mailman/listinfo/design

Reply via email to