Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Ted Yu
bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on the whole length of the key) In this case the Get's are very selective. The number of rows FuzzyRowFilter was evaluated against would be much higher. It would be nice if you remember the time each took. bq. Also, I am

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
Would be interesting to compare against Phoenix's Skip Scan (http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html) which does a scan through a coprocessor and is more than 2x faster than multi Get (plus handles multi-range scans in addition to point gets). James On

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Kiru Pakkirisamy
Ted, Re: Multiple Gets vs FuzzyRowFilter - looks like my row/column processing is mixed in and not giving a definitive view of the performance of either interfaces. I will do more testing on this, by writing a simpler test program.  A Get on a HRegion throws an exception if the key is not

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Kiru Pakkirisamy
James, I am using the FuzzyRowFilter or the Gets within  a Coprocessor. Looks like I cannot use your SkipScanFilter by itself as it has lots of phoenix imports. I thought of writing my own Custom filter and saw that the FuzzyRowFilter in the 0.94 branch also had an implementation for

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
Kiru, If you're able to post the key values, row key structure, and data types you're using, I can post the Phoenix code to query against it. You're doing some kind of aggregation too, right? If you could explain that part too, that would be helpful. It's likely that you can just query the

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread Kiru Pakkirisamy
James, Rowkey - String - len - 7 Col = String - variable length - but looks C_10345 Col value = Double If I can create a Phoenix schema mapping to this existing table that would be great. I actually do a group by the column values and return another value which is a function of the value and an

Re: Client Get vs Coprocessor scan performance

2013-08-18 Thread James Taylor
Kiru, What's your column family name? Just to confirm, the column qualifier of your key value is C_10345 and this stores a value as a Double using Bytes.toBytes(double)? Are any of the Double values negative? Any other key values? Can you give me an idea of the kind of fuzzy filtering you're

Re: issue about rowkey design

2013-08-18 Thread ch huang
what you mean secondary index? has hbase secondary index? On Sat, Aug 17, 2013 at 12:48 AM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: We did design with something equivalent to userid as the key and all the user sessions in there. But when we tried to look for particular user

Re: issue about rowkey design

2013-08-18 Thread fgaule
You can use a secondary table as a 'secondary index' setting your row as value (or column) in it. Enviado desde mi BlackBerry de Personal (http://www.personal.com.ar/) -Original Message- From: ch huang justlo...@gmail.com Date: Mon, 19 Aug 2013 09:05:19 To: user@hbase.apache.org; Kiru

RE: issue about rowkey design

2013-08-18 Thread Vladimir Rodionov
Secondary index requires multiple random seeks and is not efficient in many cases. What you need is different row_keys (one for each request type) user_id, session_id, visit_time = rowkey1 = q1, visit_time, user_id rowkey2 = q2, visit_time, session_id rowkey3 = q3, user_id, session_id : ts =