Hi All,

I've successfully clustered sequence files with KMeansDriver, but I haven't been able to pass directories of sequence files as input. I have a huge dataset (~4TB) stored in about 8000 parts and it will cost a lot of space simply to merge them into a single file. Do I need to implement my own KMeansDriver?

Thanks a lot,

- Wei Dong

Reply via email to