Your answers inline:
On Tue, Dec 1, 2009 at 11:06 PM, Sujee Maniyam <[email protected]> wrote: > Hi all, > > I have the following table > user_id => { "ip_address", "ref_url" } > column qualifiers are timestamps. Created with default options > (BLOCKSIZE => '65536', ...etc) > > so a typical row looks like: > 'user1' => { > ip_address:t1 => value1 > ip_address:t2 => value2 > ref_url:t2 => value3 > } > I have a few million rows in the table. Trying to write a simple java client. > > When I query for a user_id that has around 2-million 'values' (unique > timestamps) it is causing a region server to die with Out-of-memory > error. > > code-snippet for client: > > // ---------- > // ---- http://pastebin.com/m75fc75d1 > > Get get = new Get(key); > Result r = table.get(get); > > String[] families = {"ip_address", "ref_url"}; > for (String family : families) { > NavigableMap<byte[], byte[]> familyMap = > r.getFamilyMap(Bytes.toBytes(family)); > System.out.println(String.format(" %s #cells : %d", family, > familyMap.size())); > } > // ---------- > > > I am curious to know... > 1) is the above code doing some thing wrong? no, it looks ok. > 2) does a row data has to completely fit into memory? yes. into both the server and client memory. > 3) I will want to iterate through all the cell values, wondering what > is the best way to do that? 0.21 will have an API that allows partial row scans. In the mean time, you could try several things: - use more rows instead of columns - use more families, query on families - filters can choose what to pick based on column name. > 4) if this is the limitation for 'wide tables', then I will redesign > to table to use composite keys ( row = userid + timestamp) It's a limitation of the API which forces us to materialize the entire row in memory at one time. > > thanks so much for your help. > Sujee Maniyam > > -- > http://sujee.net >
