Re: [Chandler-dev] Topology in tags

Xun Luo Sun, 06 Aug 2006 12:47:20 -0700

At present, most research works to automatically generate a hierachy of tags are still supervised, i.e. by train and use text classifiers. The "automation" comes with leveraging of established linguistic corpuses, most commonly used and acknowledged is WordNet. WordNet organizes English vocabulary in a hierachical manner, with the concepts of synonyms, supernyms, acronyms and subsumptions etc. Normally the sense of a term is made more clearer if it is put in a context constructed by WordNet. I read the CIKM paper and that's basically the key idea of it.

Putting all the computation cost and management overhead (a fully static hierachy model will not fit along time, and a hierachy generated on the fly will involve lots of computation and reference to external corpus/collections), a key requirement to create tag hierachy is data. This requirement is hardly to be satisfied in a PIM like chandler, which contains just thousands of content items and is highly personalized. An alternative is, as mentioned, using external data, such as WordNet, or a model trained from querying established internet categories. (The best result for similar task, as seen on KDD-CUP 2005, is by using Yahoo online category, ODP online category and statistical method together, and the external data set involves tens of thousands of training terms and web pages).

Tag clustering is relatively much easier, in my opinion that's why it is commonly used for folksonomy. Flickr is definitely using that, although not quite clear about the underlying mechanism. I know the similar functionaly provided by del.icio.us is through combination of statistical methods (requires large sample, might be able to be provided by cosmo) and sense similarities (reported by a HPL paper on del.icio.us tagging dynamics).

As for the unsupervised tagging for chandler. Which I am currently planning is through keyword extraction with simple NLP. This is much similar to the time field extraction project. In my humble opinion, I think a tagging mechanism similar to Flickr's will already be very satisfying to Chandler users.

Xun

On 8/5/06, Davor Cubranic <[EMAIL PROTECTED]> wrote:

Philippe Bossut wrote:
> *But*, saying that there is no spelled out hierarchies between the
> tags does not mean that there is no structure between them. Such a
> structure will need to be deduced through how the tagged items relate
> to each other. Segmentation techniques should be able to infer a local
> hierarchy of tags even in the most tangled set. Once the local
> hierarchy is deduced (and appropriately displayed), one can imagine to
> turn "off" a whole node ("work" in the example given by Bobby).
There was a paper at CIKM in 2005 about automatically generating
hierarchies of tags based on their frequency of co-occurrence:
http://pages.stern.nyu.edu/~panos/publications/cikm2005.pdf.

Davor

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _


Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Re: [Chandler-dev] Topology in tags

Reply via email to