Total count of RandomSampleLoader is unpredicatable

Prasanth J Thu, 26 Jul 2012 18:05:03 -0700

Hello everyone

I am using RandomSampleLoader to load 1000 tuples per mapper. I have 11 map 
jobs in a small dataset and 109 map jobs in a large dataset.


I am expecting 11000 tuples from the small dataset and 109000 tuples from the 
large dataset. But the actual number of tuples that I get is always more than 
what I expected. In small dataset case I am getting 15000 tuples whereas in 
large dataset case I am getting 145000 (sometimes 150000) tuples. 

Is this a bug? or is it an expected behavior? If reservoir sampling is used by 
all mappers then why is the number of total samples is more?

Thanks
-- Prasanth

Total count of RandomSampleLoader is unpredicatable

Reply via email to