> But as I said, you don't cluster a document, you might want to recheck your > terminology :)
The terminology is fine. The same word applies to two different things here, hence the confusion. Clustering in terms of infrastructure arrangement and clustering as in statistical data analysis (or text analysis). > Clustering means I wanted to know like I submitted one docs to ES so Indexing > will happen at that time. So is it like that clustering of documents will > also happens at the same time. The Carrot2 plugin to ES does post-retrieval document clustering, so you get clusters for each individual query (and its set of hits). For this reason the query is also important -- it provides a hint to the algorithm as to which trivial clusters it should avoid. An off-line document clustering would have to be executed on all documents in a collection (index), assign cluster labels and then just filter these at query time (much like faceting does). Carrot2 does *not* provide such a functionality (and very likely won't scale to large indexes). You may want to check out Apache Mahout for this. Dawid -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt-FiGGkYXKYNdJGN3xgipW2kZ3vWVTaGhMbjC4v5PS_Sg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
