Hi, Scanning the KV pairs using HFile as you suggested, the biggest value I came across was about 2400 characters long string, with that particular row having 25000 cells from what I can tell. Is this big enough to cause the problem? There were a dozen or so values over 1000 chars long, but mainly small values under 100 chars as I mentioned earlier.
If I have to set a hard limit on the length of cell values, that is not a problem for the moment - I can chop these strings down. Thanks On Tue, Feb 23, 2010 at 3:40 PM, Bluemetrix Development <bmdevelopm...@gmail.com> wrote: > Well, the cells themselves should not be too big. Just a few Strings > (url length) or ints at the most per cell. > Its just that there could be 10M (or maybe even 100M) cells per row. > I'm on the latest 0.20.3. > I'll try to find the big record as you suggested earlier and see what > it looks like. > Thanks > > On Tue, Feb 23, 2010 at 3:18 PM, Stack <st...@duboce.net> wrote: >> On Tue, Feb 23, 2010 at 10:40 AM, Bluemetrix Development >> <bmdevelopm...@gmail.com> wrote: >>> >>> If this is the case tho, how big is too big? >> >> Each cell and its coordinates is read into memory. If not enough >> memory, then OOME. >> >> Or does it depend on my >>> disk/memory resources? >>> I'm currently using dynamic column qualifiers, so I could have been reaching >>> rows with 10s of millions of unique column qualifiers each. >> >> This should be fine as long as you are on a recent hbase. >> >> I'd say it a big cell or many big cells concurrently that caused the OOME. >> >> >>> Or, with other tables using timestamps as another dimension to the >>> data, and therefore >>> reaching 10s of millions of versions. >>> (I was trying to get HBase back up so I could count these numbers.) >>> >>> What limits should I use for the time being for number of qualifiers >>> and number of timestamps/versions? >> >> Shouldn't be an issue. >> >> St.Ack >> >