You need to extend RandomSeedGenerator to take in a directory instead of a file. Shouldn't have to make significant changes to KMeansDriver. I have made the changes already (plus quite a few other things that I would like to contribute) but I am currently stuck in getting clearance from my company's Open Source Working Group =(

Adil

Wei Dong wrote:
Hi All,

I've successfully clustered sequence files with KMeansDriver, but I haven't been able to pass directories of sequence files as input. I have a huge dataset (~4TB) stored in about 8000 parts and it will cost a lot of space simply to merge them into a single file. Do I need to implement my own KMeansDriver?

Thanks a lot,

- Wei Dong


Reply via email to