On Fri, Mar 22, 2013 at 12:12 AM, Nicolas Seyvet <nicolas.sey...@gmail.com> wrote: > @J-D: Thanks, this sounds very likely. > > One more thing, from the logs of one slave, I can see the following: > 2013-03-21 22:27:15,041 INFO org.apache.hadoop.hbase.regionserver.Store: > Completed major compaction of 9 file(s) in f of > rc_nise,$,1363860406830.5689430f7a27cc511f99dcb62001edc6. into > 5418126f3d154ef3aca8027e04512279, size=8.3g; total size for store is 8.3g > [...] > 2013-03-21 23:34:31,836 INFO org.apache.hadoop.hbase.regionserver.Store: > Completed major compaction of 5 file(s) in f of > rc_nise,$,1363860406830.5689430f7a27cc511f99dcb62001edc6. into > 3bdeb58c57af4ee1a92d22865e707416, size=8.3g; total size for store is 8.3g > > Are not those the sign that a major compaction also occurred? > And if so, what could have triggered it?
If the compaction algo selects all the files for compaction, it gets upgraded into a major compaction because it's essentially the same thing. > > > > > > On Thu, Mar 21, 2013 at 8:06 PM, Nicolas Seyvet > <nicolas.sey...@gmail.com>wrote: > >> @Ram: You are entirely correct, I made the exact same mistakes of mixing >> up Large and minor compaction. By looking closely, what I see is that at >> around 200 HFiles per region it starts minor compacting files per group of >> 10 HFiles. The "problem" seems that this minor compacting never stops even >> when there are about 20 HFiles left. It just keep on going and on taking >> more and more time (I guess because the files to compact are getting >> bigger). >> >> Of course in parallel we keep on adding more and more data. >> >> @J-D: "It seems to me that it would be better if you were able to do a >> single load for all your files." Yes, I agree.. but that is not what we >> are testing, our use case is to use 1min batch files. >> >> >> >> >> >>