Hi,
Alec Flett wrote:
I personally wish that I could just have a delicious-like interface
for all my mail - I mean ultimately, I don't need to see all my mail
folders in my sidebar - I need some subset of collections that give me
quick access to important messages.
Part of that dream is that someone has gone and already tagged my
thousands of e-mail messages with relevant tags, and so forth. But I
always wondered - say we fast forward some number of months/years to
when chandler is usable as a mail client and we have this world of
tags and collections, how does my mail get from hierarchical mail
folders to this neat folksonomy?
One thought I had is that there is actually a lot of meta-information
stored in mail that we could sort of auto-create collections from...
essentially, what if chandler sort of auto-tagged your e-mail and then
also made some special collections based on data collected during the
auto-tagging?
For instance:
1) Folder name. I have some folders in hierarchies (I'm using the
delimiter '.' as we do on the OSAF server), "MailingLists.dev" "bugs",
"Sent", "Sent.2005" and so forth. I think we could more or less tag
each message with all the components of the path. For instance a
message in "MailingLists.dev" would get tagged with "MailingLists" and
"dev" automatically. Messages in "bugs" would get tagged as "bugs". We
may be able to develop heuristics for making a good guess at which
tags to create collections for in the sidebar. For instance, if all
messages in the "bugs" folder are also from
"[EMAIL PROTECTED]" then maybe that tells us something
else about what the "bugs" tag means.
2) Sender/Recipients. We wouldn't need to tag anything based on this,
but we could auto-create collections for the "top people" - i.e. I
have a few specific friends who I probably exchange at least 30% of my
personal e-mail with. We could create collections for these "top
people" - I think we could probably come up with some pretty decent
heuristics for determining who those people are.
3) Mailing lists - there are multiple ways to identify mailing lists,
from well-known RFC822 headers, to looking for [xxxxx] in subject
lines, to noticing that MY e-mail address doesn't appear in the To or
CC headers. We could again develop some heuristics to filter out the
noise so that if I have 1000 messages with [....] in the subject line,
but only 2 of them have [ACLFW] then I'm probably not on a mailing
list called ACLFW and might just have some messages from them, or a
friend used [ACLFW] in a subject line when sending a message.
4) Date - we could easily auto-create some collections for "more than
1 year old" or "last month" - because honestly, I don't care where my
really old messages are, as long as I can find them by searching.
Anyway, you get the idea. I really like the idea of at least giving
the user a "best guess" when transitioning from another mail system
into chandler. The same principle could apply for importing calendars
from iCal, RSS feeds from BlogLines, or whatever.
Those are all great ideas. There are indeed a bunch of low hanging
fruits as far as auto tagging goes. The decomposition of the IMAP
hierarchy into a set of non-hierarchical tags is right on IMO.
Note that having tags does not mean that you have to go away with
displaying a hierarchy altogether. When displaying a graph (a tree being
a subclass of a general graph), you can imagine pulling on one vertice
and, using appropriate heuristics based on weighting the edges of the
graph, identify a subtree in its vicinity. That means of course that we
can identify a graph structure through the data and that's where
segmentation algorithms would help.
On auto tagging now, there are statistical techniques to segment and map
big data sets that do not require some hocus pocus semantic
understanding of text. For tags created by users, one can imagine that
once a significant seed set has been tagged by the user, the system can
detect patterns automatically and tag as appropriate.
Without going into the details, things would look like:
- segment data (big algo running on the whole data set from time to
time) and identify groups and structure of groups
- map tags to groups
- choose root vertice (automatically or based on some context given by
the user)
- prune local graph to identify small shallow tree
- display local tree (hierarchical or not at user's choice)
For new data trickling in:
- project and map data on the group structure (i.e. don't run the
analysis for each email coming in...)
- tags the new data according to where it falls (one can attach some
bayesian value to it that can be used to reduce the amount of false
positive)
OK, I went super fast (each point here above would deserve a substantial
explanation) because it's really a Chandler v2.0 idea but the point is
that auto tagging is indeed important and that there are techniques out
there that can be used to do just that that would not require deep
semantic analysis of free form text.
Cheers,
- Philippe
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Design" mailing list
http://lists.osafoundation.org/mailman/listinfo/design