HBase Client.

2013-03-20 Thread Pradeep Kumar Mantha
Hi, I would like to benchmark HBase using some of our distributed applications using custom developed benchmarking scripts/programs. I found the following clients are available. Could you please let me know which of the following provides best performance. 1. Java direct interface to

Re: HBase Client.

2013-03-20 Thread Viral Bajaria
Most of the clients listed below are language specific, so if your benchmarking scripts are written in JAVA, you are better off running the java client. HBase Shell is more for running something interactive, not sure how you plan to benchmark that. REST is something that you could use, but I can't

Re: Truncate hbase table based on column family

2013-03-20 Thread Ted Yu
Can you clarify your question ? Did you mean that you only want to drop certain column families ? Thanks On Wed, Mar 20, 2013 at 7:15 AM, varaprasad.bh...@polarisft.com wrote: Hi All, Can we truncate a table in hbase based on the column family. Please give your comments. Thanks

Re: Welcome our newest Committer Anoop

2013-03-20 Thread Jimmy Xiang
Congratulations! On Wed, Mar 20, 2013 at 6:11 AM, Jonathan Hsieh j...@cloudera.com wrote: welcome welcome! On Wed, Mar 13, 2013 at 10:23 AM, Sergey Shelukhin ser...@hortonworks.comwrote: Congrats! On Tue, Mar 12, 2013 at 10:38 PM, xkwang bruce bruce.xkwa...@gmail.com wrote:

Re: How to catch java.net.ConnectException and when

2013-03-20 Thread Jean-Marc Spaggiari
Hi Gaurhari, Can you please tell us a bit more about what you want to acheive? When do you want to catch this exception? On which operation? JM 2013/3/20 gaurhari dass gaurharid...@gmail.com: I want to catch connect exception in hbase

Re: HBase Client.

2013-03-20 Thread James Taylor
Another one to add to your list: 6. Phoenix (https://github.com/forcedotcom/phoenix) Thanks, James On Mar 20, 2013, at 2:50 AM, Vivek Mishra vivek.mis...@impetus.co.in wrote: I have used Kundera, persistence overhead on HBase API is minimal considering feature set available for use within

Does HBase RegionServer benefit from OS Page Cache

2013-03-20 Thread Pankaj Gupta
Given that HBase has it's own cache (block cache and bloom filters) and that all the table data is stored in HDFS, I'm wondering if HBase benefits from OS page cache at all. In the set up I'm using HBase Region Servers run on the same boxes as the HDFS data node. In such a scenario if the

Re: HBase Client.

2013-03-20 Thread Ian Varley
Pradeep - One more to add to your list of clients is Phoenix: https://github.com/forcedotcom/phoenix It's a SQL skin, built on top of the standard Java client with various optimizations; it exposes HBase via a standard JDBC interface, and thus might let you easily plug into other tools for

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Dan Crosta
I'm confused -- I only see one setting in CDH manager, what is the name of the other setting? Our load is moderately frequent small writes (in batches of 1000 cells at a time, typically split over a few hundred rows -- these complete very fast, we haven't seen any timeouts there), and

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Ted Yu
In 0.94, there is only one setting. See release notes of HBASE-6170 which is in 0.95 Looks like this should help (in 0.95): https://issues.apache.org/jira/browse/HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly From your description, you should be

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Bryan Beaudreault
Typically it is better to use caching and batch size to limit the number of rows returned and thus the amount of processing required between calls to next() during a scan, but it would be nice if HBase provided a way to manually refresh a lease similar to Hadoop's context.progress(). In a cluster

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Ted Yu
bq. if HBase provided a way to manually refresh a lease similar to Hadoop's context.progress() Can you outline how the above works for long scan ? bq. Even being able to override the timeout on a per-scan basis would be nice. Agreed. On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault

Re: Does HBase RegionServer benefit from OS Page Cache

2013-03-20 Thread Jean-Daniel Cryans
First, MSLAB has been enabled by default since 0.92.0 as it was deemed stable enough. So, unless you are on 0.90, you are already using it. Also, I'm not sure why you are referencing the HLog in your first paragraph in the context of reading from disk, because the HLogs are rarely read (only on

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Bryan Beaudreault
I was thinking something like this: Scan scan = new Scan(startRow, endRow); scan.setCaching(someVal); // based on what we expect most rows to take for processing time ResultScanner scanner = table.getScanner(scan); for (Result r : scanner) { // usual processing, the time for which we

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Ted Yu
Bryan: Interesting idea. You can log a JIRA with the following two suggestions. On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: I was thinking something like this: Scan scan = new Scan(startRow, endRow); scan.setCaching(someVal); // based on what we

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Bryan Beaudreault
Thanks Ted, I've submitted https://issues.apache.org/jira/browse/HBASE-8157. On Wed, Mar 20, 2013 at 1:56 PM, Ted Yu yuzhih...@gmail.com wrote: Bryan: Interesting idea. You can log a JIRA with the following two suggestions. On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault

Evenly splitting the table

2013-03-20 Thread Cole
I was wondering how I can go about evenly splitting an entire table in HBase during table creation[1]. I tried providing the empty byte arrays HConstants.EMPTY_START_ROW and HConstants.EMPTY_END_ROW as parameters to the method I linked below, and got an error: Start key must be smaller than end

Re: Evenly splitting the table

2013-03-20 Thread Ted Yu
Take a look at TestAdmin#testCreateTableRPCTimeOut() where hbaseadmin.createTable() is called. bq. Is there a way to go about splitting the entire table without having specific start and end keys? I don't think so. On Wed, Mar 20, 2013 at 3:32 PM, Cole cole.skov...@cerner.com wrote: I was

Fwd: Questions about versions and timestamp

2013-03-20 Thread Benyi Wang
Hi, Please forgive me if my questions have been already asked and answered many times because I could not googled any of them. If I do the following commands in hbase shell, hbase(main):048:0 create test_ts_ver, data 0 row(s) in 1.0550 seconds hbase(main):049:0 describe test_ts_ver DESCRIPTION

Re: Questions about versions and timestamp

2013-03-20 Thread Ted Yu
A few pointers so that you can find the answer yourself: http://hbase.apache.org/book.html Take a look at 2.5.2.8. Managed Compactions and http://hbase.apache.org/book.html#compaction You can also use search-hadoop.com e.g. 'Possible to delete a specific cell?' Cheers On Wed, Mar 20, 2013 at

Re: Evenly splitting the table

2013-03-20 Thread Aaron Kimball
Hi Cole, How are your keys structured? In Kiji, we default to using hashed row keys where each key starts with two bytes of salt. This makes it a lot easier to pre-split the table since you can make stronger guarantees about the key distribution. If your keys are raw text like, say, plaintext