On Nov 3, 2009, at 5:43am, Grant Ingersoll wrote:

Might be of interest to all you Mahouts out there...  
http://bixolabs.com/datasets/public-terabyte-dataset-project/

Would be cool to get this converted over to our vector format so that we can cluster, etc.


How much additional space would be required for the vectors, in some optimal compressed format? Say as a percentage of raw text size.

I'm asking because I have some flexibility in the processing and associated metadata I can store as part of the dataset.

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to