Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Doug Meil
Hi there- On top of what everybody else said, for more info on rowkey design and pre-splitting see http://hbase.apache.org/book.html#schema (as well as other threads in this dist-list on that topic). On 1/19/13 4:12 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Austin, I am

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Vikas Jadhav
According to me HBase need to store more metadata than hive (For each value it stores seperately row key , col_family ,col_name,value) and file size of original hdfs file may increase in size I also wondered this if anyone has got better result for hbase than hive let us know. Thank You On

Re: Storing images in Hbase

2013-01-20 Thread Jack Levin
I forgot to mention that I also have this setup: property namehbase.hregion.memstore.flush.size/name value33554432/value descriptionFlush more often. Default: 67108864/description /property This parameter works on per region amount, so this means if any of my 400 (currently) regions on a

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-20 Thread Eugeny Morozov
Ted, thanks for the question. There are results of investigation. It seems I am mistaken. I thought that scanners are assigned to each regions to scan (and do that in parallel) and that means each scanner should start from the beginning of its region and then fall down to the required record.

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-20 Thread Michael Segel
If its the same class and its not a patch, then the first class loaded wins. So if you have a Class Foo and HBase has a Class Foo, your code will never see the light of day. Perhaps I'm stating the obvious but its something to think about when working w Hadoop. On Jan 19, 2013, at 3:36 AM,

HBase 0.94 shell throwing a NoSuchMethodError: hbase.util.Threads.sleep(I)V from ZK code

2013-01-20 Thread tsuna
I just updated my local tree (branch 0.94, SVN r1435317) and I see these spurious exceptions in the HBase shell: $ COMPRESSION=LZO HBASE_HOME=~/src/hbase ./src/create_table.sh HBase Shell; enter 'helpRETURN' for list of supported commands. Type exitRETURN to leave the HBase Shell Version 0.94.4,

Re: HBase 0.94 shell throwing a NoSuchMethodError: hbase.util.Threads.sleep(I)V from ZK code

2013-01-20 Thread Ted Yu
Thanks for reporting this, Benoit. Here is the call: Threads.sleep(1); Here is the method to be called: public static void sleep(long millis) { Notice the mismatch in argument types: 1 being integer and millis being long. Cheers On Sun, Jan 20, 2013 at 9:01 PM, tsuna

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Austin Chungath
Thank you Tariq. I will let you know how things went after I implement these suggestions. Regards, Austin On Sun, Jan 20, 2013 at 2:42 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Austin, I am sorry for the late response. Asaf has made a very valid point. Rowkwey design is

RE: Loading data, hbase slower than Hive?

2013-01-20 Thread Anoop Sam John
Austin, You are using HFileOutputFormat or TableOutputFormat? -Anoop- From: Austin Chungath [austi...@gmail.com] Sent: Monday, January 21, 2013 11:15 AM To: user@hbase.apache.org Subject: Re: Loading data, hbase slower than Hive? Thank you Tariq.

Re: HBase 0.94 shell throwing a NoSuchMethodError: hbase.util.Threads.sleep(I)V from ZK code

2013-01-20 Thread lars hofhansl
I suspect this is a different problem. Java will happily cast an int to a long where needed. Does  mvn clean install  fix this? If not, let's file a jira. -- Lars From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Sunday, January 20, 2013 9:30

Re: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

2013-01-20 Thread Mohammad Tariq
Give put(ListPut puts) a shot and see if it works for you. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 21, 2013 at 11:41 AM, Farrokh Shahriari mohandes.zebeleh...@gmail.com wrote: Hi there Is there any way to use arrayList of Puts in map function to

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Mohammad Tariq
Apart from this you can have some additional tweaks to improve put performance. Like, creating pre-splitted tables, making use of put(ListPut puts) instead of normal put etc. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 21, 2013 at 11:46 AM, Austin Chungath

RE: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

2013-01-20 Thread Anoop Sam John
And also how can I use autoflush bufferclientside in Map function for inserting data to Hbase Table ? You are using TableOutputFormat right? Here autoFlush is turned OFF ... You can use config param hbase.client.write.buffer to set the client side buffer size. -Anoop-

RE: Loading data, hbase slower than Hive?

2013-01-20 Thread Anoop Sam John
@Mohammad As he is using HFileOutputFormat, there is no put call happening on HTable. In this case the MR will create the HFiles directly with out using the normal HBase write path. Then later using HRS API the HFiles are loaded to the table regions. In this case the number of reducers will be

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Mohammad Tariq
Thank you so much for pointing out the mistake sir. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 21, 2013 at 12:06 PM, Anoop Sam John anoo...@huawei.com wrote: @Mohammad As he is using HFileOutputFormat, there is no put call happening on HTable. In this

confused about Data/Disk ratio

2013-01-20 Thread tgh
Hi I use hbase to store Data, and I have an observation, that is, When hbase store 1Gb data, hdfs use 10Gb disk space, and when data is 60Gb, hdfs use 180Gb disk, and when data is about 2Tb, hdfs use 3Tb disk, That is, the ratio of data/disk is not a linear one, and why,