Sorry, resending from the correct email address. Ted,
Thanks for pitching in. Ordering is extremely important indeed. On Thu, Nov 19, 2009 at 12:56 AM, Ted Dunning <[email protected]> wrote: > If you want to preserve some ordering ifnormation, then you have a bit more > of a problem. The same basic idea can work where you model your data as a > mixture density over sequence models. Once you do that, then the mixture > parameters make a reasonable space to cluster in. If you have some kind of > sequence model then the dirichlet process code currently in Mahout can be > used to do your clustering. Dont they ( hidden-variable-mixture-models) contradict De Finetti's basic exchangibility theorem. Unless you are treating each sequence itself as a term ( which I think is probably what you are referring to ) and doing sampling on them. In that case how am I creating documents ? > > There is probably one too many if's in the previous paragraph for you to be > happy with it. > > Can you say something more about your sequences? Can you say something > about your resources? Do you have a good sequence model? Basically I want to cluster user's browsing behavior. And see what are the dominant browsing paths for a particular user. For example : portal->sports->ad-click->movies->ad-click->ad-click etc. Would also appreciate your thoughts on Suffix-Tree-Clustering based approaches, which I have been contemplating. Meanwhile there seems to be lot more work done for bioinformatics than text/web-mining in Sequence Clustering. -Prasen
