Re: RowLocks

2013-08-29 Thread Kristoffer Sjögren
Maybe I should illustrate with a specific usecase.

I want to create a unique row with a rowkey that look like this:
id+timestamp

The timestamp (in millis) is provided by the user. It is ONLY the id that
dictate uniqueness, not the timestamp. So there is a race condition here.

My reasoning is as follows.

1) Lock the id (without the timestamp).

2.1) If the lock was aquired. Scan for the row.
2.2) If the row was not found, create it and set a counter on the
id+timestamp row to 0.
2.3) If the row is found, increase a counter on the id+timestamp row.
2.4) Release the lock.

If the lock was NOT aquired.

3.1) Not really sure how to proceed here. The easiest way is probably to
wait until the lock is released (like a SELECT FOR UPDATE). Retries would
also work.

It is very important that we do not loose the increments or create multiple
rows with same id with different timestamps when there are race conditions.



On Thu, Aug 29, 2013 at 6:22 AM, lars hofhansl la...@apache.org wrote:

 Specifically the API has been removed because it had never actually worked
 correctly.


 Rowlocks are used by RegionServers for intra-region operations.
 As such they are ephemeral, in-memory constructs, that cannot reliably
 outlive a single RPC request.
 The HTable rowlock API allowed you to create a rowlocks and hold it over
 multiple RPCs, which would break if f.e. a region is moved or split.

 -- Lars
 
 From: Ted Yu yuzhih...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 8:01 PM
 Subject: Re: RowLocks


 The API is no longer a public API

 Thanks


 On Wed, Aug 28, 2013 at 7:58 PM, Michael Segel michael_se...@hotmail.com
 wrote:

  Ted,
  Can you clarify...
  Do you mean the API is no longer a public API, or do you mean no more RLL
  for atomic writes?
 
  On Aug 28, 2013, at 5:18 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   RowLock API has been removed in 0.96.
  
   Can you tell us your use case ?
  
  
   On Wed, Aug 28, 2013 at 3:14 PM, Kristoffer Sjögren sto...@gmail.com
  wrote:
  
   Hi
  
   About the internals of locking a row in hbase.
  
   Does hbase row locks map one-to-one with a locks in zookeeper or are
  there
   any optimizations based on the fact that a row only exist on a single
   machine?
  
   Cheers,
   -Kristoffer
  
 
  The opinions expressed here are mine, while they may reflect a cognitive
  thought, that is purely accidental.
  Use at your own risk.
  Michael Segel
  michael_segel (AT) hotmail.com
 
 
 
 
 
 



issue debug hbase

2013-08-29 Thread kun yan
Hi all

I use maven complie hbase src hbase version is 0.94
I can debug hbase for example create table as Java Application(i set
breakpoint )that is so nice to learn hbase how to create table
but i found i cannot debug hbase as remote java application when i
breakpoint to src(in my local client) the programming  exec but breakpoint
do not anything

The problem :
i don't understand why i can debug create table as java application (set
breakpoint in my local create table code )but when i use hbase shell create
'demo','s' my eclipse do not anything.(set breakpoint in my local
hbase/src/)
ps: i configuration remote java application host:port etc

thank your for you help

-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com


issue debug hbase

2013-08-29 Thread kun yan
Hi all

I use maven complie hbase src hbase version is 0.94
I can debug hbase for example create table as Java Application(i set
breakpoint )that is so nice to learn hbase how to create table
but i found i cannot debug hbase as remote java application when i
breakpoint to src(in my local client) the programming  exec but breakpoint
do not anything

The problem :
i don't understand why i can debug create table as java application (set
breakpoint in my local create table code )but when i use hbase shell create
'demo','s' my eclipse do not anything.(set breakpoint in my local
hbase/src/)
ps: i configuration remote java application host:port etc

thank your for you help

-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com


Hbase RowKey design schema

2013-08-29 Thread Wasim Karani
I am using HBase to store webtable content like how google is using bigtable.
For reference of google bigtable
My question is on RowKey, how we should be forming it.
What google is doing is saving the URL in a reverse order as you can see in 
the PDF document com.cnn.www so that all the links associated with cnn.com 
will be manages in same block of GFS which will be lot easier to scan.
I can use the same thing as google is using but wont it will be cool if I use 
some algorithm to compress the url

For eg.

RewKey   |  Google Bigtable  
|  Algorithm output
www.cnn.com/index.php|  com.cnn.www/index.php
|  12as/435
www.cnn.com/news/business/index.html |  com.cnn.www/news/business/index.html 
|  12as/2as/dcx/asd
www.cnn.com/news/sports/index.html   |  com.cnn.www/news/sports/index.html   
|  12as/2as/eds/scf
Reason behind doing this is rowkey will be shorter as per the Hbase design 
schema (Mentioned in topic 6.3.2.3. Rowkey Length).

So what do I need from you guys is to know am I correct over here
Also if I am correct what Algorithm I should using. I am using python over 
thrift as a programming language so code will be overwhelming for me...



java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable

2013-08-29 Thread praveenesh kumar
Hi all,

I am trying to write a MR code to load a HBase table.

I have a mapper that emits (null,put object) and I am using
TableMapReduceUtil.initTableReducerJob() to write it into a HBase table.

Following is my code snippet

public class MYHBaseLoader extends
MapperNullWritable,BytesWritable,NullWritable,Put {

 protected void map (LongWritable key, BytesWritable value, Context
context) throws IOException, InterruptedException {

  /- Some processing here.. Create put object and pushing it
into Put object).
context.write(null, put);// Pushing the put object.

}

public static void main (String args[]) throws IOException,
ClassNotFoundException, InterruptedException{
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(MYHBaseLoader.class);
job.setMapperClass(MYHBaseLoader.class);

TableMapReduceUtil.initTableReducerJob(MY_IMPORT_TABLE_NAME,IdentityTableReducer.class,job);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(Put.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
//job.setNumReduceTasks(0);

FileInputFormat.setInputPaths(job, new Path(test));
Path outputPath = new Path(test_output);
FileOutputFormat.setOutputPath(job,outputPath);

//outputPath.getFileSystem(conf).delete(outputPath, true);

job.waitForCompletion(true);
System.out.println(Done);
}


I am getting the following error while running. Any help/guidance:


java.io.IOException: Type mismatch in value from map: expected
org.apache.hadoop.hbase.client.Put, recieved
org.apache.hadoop.io.BytesWritable
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1023)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:689)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


Regards
Praveenesh


Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Saurabh Yahoo
Hi Vlad,

We do have strict latency requirement as it is financial data requiring direct 
access from clients. 

Are you saying that it is not possible to achieve sub second latency using 
hbase (because it is based on java.) ?







On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote:

 Increasing Java heap size will make latency worse, actually.
 You can't guarantee 1 sec max latency if run Java app (unless your heap size 
 is much less than 1GB). 
 I have never heard about strict maximum latency limit. Usually , its 99% , 
 99.9 or 99.99% query percentiles.
 
 You can greatly reduce your 99.xxx% percentile latency by storing you data in 
 2 replicas to two different region servers.
 Issue two read operations to those two region servers in parallel and get the 
 first response. Probability theory states that  probability 
 of two independent events (slow requests) is  the product of event's 
 probabilities themselves. 
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 4:18 PM
 To: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Kiru,
 
 Scan is not an option for our use cases.  Our read is pretty random.
 
 Any other suggestion to bring down the latency.
 
 Thanks,
 Saurabh.
 
 
 On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 
 40million row table as 400K rows and columns. We Get about 100 of the rows 
 from this 400K , do quite a bit of calculations in the coprocessor (almost a 
 group-order by) and return in this time.
 Maybe should consider replacing the MultiGets with Scan with Filter. I like 
 the FuzzyRowFilter even though you might need to match with exact key. It 
 works only with fixed length key.
 (I do have an issue right now, it is not scaling to multiple clients.)
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Saurabh Yahoo saurabh...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 3:20 PM
 Subject: Re: experiencing high latency for few reads in HBase
 
 
 Thanks Kitu. We need less than 1 sec latency.
 
 We are using both muliGet and get.
 
 We have three concurrent clients running 10 threads each. ( that makes total 
 30 concurrent clients).
 
 Thanks,
 Saurabh.
 
 On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Right 4 sec is good.
 @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this 
 a Get or a Scan ?
 BTW, in this stress test how many concurrent clients do you have ?
 
 Regards,
 - kiru
 
 
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 12:15 PM
 Subject: RE: experiencing high latency for few reads in HBase
 
 
 1. 4 sec max latency is not that bad taking into account 12GB heap.  It can 
 be much larger. What is your SLA?
 2. Block evictions is the result of a poor cache hit rate and the root 
 cause of a periodic stop-the-world GC pauses (max latencies
 latencies you have been observing in the test)
 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 
 25% - permanent). Permanent part is for CF with
 IN_MEMORY = true (you can specify this when you create CF).  Block first 
 stored in 'young gen' space, then gets promoted to 'tenured gen' space
 (or gets evicted). May be your 'perm gen' space is underutilized? This is 
 exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the 
 space allocated for block cache -
 there is no guarantee (as usual). If you don have in_memory column families 
 you may decrease
 
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 5:10 AM
 To: user@hbase.apache.org
 Subject: experiencing high latency for few reads in HBase
 
 Hi,
 
 We are running a stress test in our 5 node cluster and we are getting the 
 expected mean latency of 10ms. But we are seeing around 20 reads out of 25 
 million reads having latency more than 4 seconds. Can anyone provide the 
 insight what we can do to meet below second SLA for each and every read?
 
 We observe the following things -
 
 1. Reads are evenly distributed among 5 nodes.  CPUs remain under 5% 
 utilized.
 
 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block 
 cache got filled up but around 1gb remained free. There are a large number 
 of 

Re: how to export data from hbase to mysql?

2013-08-29 Thread Mohammad Tariq
My 2 cents :

1- Map your table to a Hive table and do the export using Sqoop.
2- Export http://hbase.apache.org/book/ops_mgt.html#export the table to a
file first, and then export it using Sqoop.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 28, 2013 at 7:12 PM, Shahab Yunus shahab.yu...@gmail.comwrote:

 Taking what Ravi Kiran mentioned a level higher, you can also use Pig. It
 has DBStorage. Very easy to rad from HBase and dump to MySQL if your data
 porting does not require complex transformation (even which can be handled
 in Pig too.)


 http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/DBStorage.html

 Regards,
 Shahab


 On Wed, Aug 28, 2013 at 1:26 AM, Ravi Kiran maghamraviki...@gmail.com
 wrote:

  If you would like to have greater control on what data / which columns
  from HBase should be going into MySQL tables , you can write a simple MR
  job and use the DBOutputFormat .   It is a simple one and works great for
  us.
 
  Regards
  Ravi
 
 
 
  On Wed, Aug 28, 2013 at 10:42 AM, ch huang justlo...@gmail.com wrote:
 
   i use hive ,maybe it's a way ,let me try  it
  
   On Wed, Aug 28, 2013 at 11:21 AM, James Taylor jtay...@salesforce.com
   wrote:
  
Or if you'd like to be able to use SQL directly on it, take a look at
Phoenix (https://github.com/forcedotcom/phoenix).
   
James
   
On Aug 27, 2013, at 8:14 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org wrote:
   
 Take a look at sqoop?
 Le 2013-08-27 23:08, ch huang justlo...@gmail.com a écrit :

 hi,all:
 any good idea? thanks

   
  
 



Re: RowLocks

2013-08-29 Thread Michael Segel
Thanks for the update. 

Actually they worked ok for what they were.  IMHO they should never had been 
made public because they aren't RLL that people think of as part of 
transactions and isolation levels found in RDBMSs.

Had me worried there for a sec... 

Thx
On Aug 28, 2013, at 11:22 PM, lars hofhansl la...@apache.org wrote:

 Specifically the API has been removed because it had never actually worked 
 correctly.
 
 
 Rowlocks are used by RegionServers for intra-region operations.
 As such they are ephemeral, in-memory constructs, that cannot reliably 
 outlive a single RPC request.
 The HTable rowlock API allowed you to create a rowlocks and hold it over 
 multiple RPCs, which would break if f.e. a region is moved or split.
 
 -- Lars
 
 From: Ted Yu yuzhih...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org 
 Sent: Wednesday, August 28, 2013 8:01 PM
 Subject: Re: RowLocks
 
 
 The API is no longer a public API
 
 Thanks
 
 
 On Wed, Aug 28, 2013 at 7:58 PM, Michael Segel 
 michael_se...@hotmail.comwrote:
 
 Ted,
 Can you clarify...
 Do you mean the API is no longer a public API, or do you mean no more RLL
 for atomic writes?
 
 On Aug 28, 2013, at 5:18 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 RowLock API has been removed in 0.96.
 
 Can you tell us your use case ?
 
 
 On Wed, Aug 28, 2013 at 3:14 PM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 Hi
 
 About the internals of locking a row in hbase.
 
 Does hbase row locks map one-to-one with a locks in zookeeper or are
 there
 any optimizations based on the fact that a row only exist on a single
 machine?
 
 Cheers,
 -Kristoffer
 
 
 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com
 
 
 
 
 
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com







Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Federico Gaule
In 0.94.11 Release, has been included an optimization for MultiGets: 
https://issues.apache.org/jira/browse/HBASE-9087


What version have you deployed?


On 08/29/2013 01:29 AM, lars hofhansl wrote:

A 1s SLA is tough in HBase (or any large memory JVM application).


Maybe, if you presplit your table, play with JDK7 and the G1 collector, but 
nobody here will vouch for such an SLA in the 99th percentile.
I heard some folks have experimented with 30GB heaps and G1 and have reported 
max GC times of 200ms, but I have not verified that.

-- Lars



- Original Message -
From: Saurabh Yahoo saurabh...@yahoo.com
To: user@hbase.apache.org user@hbase.apache.org
Cc: user@hbase.apache.org user@hbase.apache.org
Sent: Wednesday, August 28, 2013 3:17 PM
Subject: Re: experiencing high latency for few reads in HBase

Hi Vlad,

Thanks for your response.

1. Our SLA is less than one sec. we cannot afford latency more than 1 sec.

We can increase heap size if that help, we have enough memory on server. What 
would be the optimal heap size?

2. Cache hit ratio is 95%.  One thing I don't understand that we have allocated 
only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is 
no write. Memcache is empty. Is 8gb not enough for hbase to process the 
requests? What are the most memory consuming objects in region server?

3. We will change the cf to IN_memory and report back performance difference.

Thanks,
Saurabh.

On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote:


1. 4 sec max latency is not that bad taking into account 12GB heap.  It can be 
much larger. What is your SLA?
2. Block evictions is the result of a poor cache hit rate and the root cause of 
a periodic stop-the-world GC pauses (max latencies
 latencies you have been observing in the test)
3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - 
permanent). Permanent part is for CF with
IN_MEMORY = true (you can specify this when you create CF).  Block first stored 
in 'young gen' space, then gets promoted to 'tenured gen' space
(or gets evicted). May be your 'perm gen' space is underutilized? This is exact 
25% of 4GB (1GB). Although HBase LruBlockCache should use all the space 
allocated for block cache -
there is no guarantee (as usual). If you don have in_memory column families you 
may decrease



Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Saurabh Yahoo [saurabh...@yahoo.com]
Sent: Wednesday, August 28, 2013 5:10 AM
To: user@hbase.apache.org
Subject: experiencing high latency for few reads in HBase

Hi,

We are running a stress test in our 5 node cluster and we are getting the 
expected mean latency of 10ms. But we are seeing around 20 reads out of 25 
million reads having latency more than 4 seconds. Can anyone provide the 
insight what we can do to meet below second SLA for each and every read?

We observe the following things -

1. Reads are evenly distributed among 5 nodes.  CPUs remain under 5% utilized.

2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache 
got filled up but around 1gb remained free. There are a large number of cache 
eviction.

Questions to experts -

1. If there are still 1gb of free block cache available, why is hbase evicting 
the block from cache?

4. We are seeing memory went up to 10gb three times before dropping sharply to 
5gb.

Any help is highly appreciable,

Thanks,
Saurabh.

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.




HBase client with security

2013-08-29 Thread Lanati, Matteo
Hi all,

I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with 
security.
HBase works if I launch the shell from the node running the master, but I'd 
like to use it from an external machine.
I prepared one, copying the Hadoop and HBase installation folders and adapting 
the path (indeed I can use the same client to run MR jobs and interact with 
HDFS).
Regarding HBase client configuration:

- hbase-site.xml specifies

 property
   namehbase.security.authentication/name
   valuekerberos/value
 /property
 property
   namehbase.rpc.engine/name
   valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value
 /property
 property
   namehbase.zookeeper.quorum/name
   valuemaster.hadoop.local,host49.hadoop.local/value
 /property

where the zookeeper hosts are reachable and can be solved via DNS. I had to 
specify them otherwise the shell complains about 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
= ConnectionLoss for /hbase/hbaseid

- I have a keytab for the principal I want to use (user running hbase/my 
client hostname@MYREALM), correctly addressed by the file 
hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to 
zk-jaas.conf.

Nonetheless, when I issue a command from a HBase shell on the client machine, I 
got an error in the HBase master log

2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
listener on 6: readAndProcess threw exception 
org.apache.hadoop.security.AccessControlException: Authentication is required. 
Count of bytes read: 0
org.apache.hadoop.security.AccessControlException: Authentication is required
at 
org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

It looks like there's a mismatch between the client and the master regarding 
the authentication mechanism. Note that from the same client machine I can 
launch and use a Zookeeper shell.
What am I missing in the client configuration? Does /etc/krb5.conf play any 
role into this?
Thanks,

Matteo


Matteo Lanati
Distributed Resources Group
Leibniz-Rechenzentrum (LRZ)
Boltzmannstrasse 1
85748   Garching b. München (Germany)
Phone: +49 89 35831 8724




Re: Hbase RowKey design schema

2013-08-29 Thread Shahab Yunus
What advantage you will be gaining by compressing? Less space? But then it
will add compression/decompression performance overhead. A trade-off but a
especially significant as space is cheap and redundancy is OK with such
data stores.

Having said that, more importantly, what are your read use-cases or access
patterns? That should drive your decision about row key design.

Regards,
Shahab


On Thu, Aug 29, 2013 at 5:21 AM, Wasim Karani wa...@userworkstech.comwrote:

 I am using HBase to store webtable content like how google is using
 bigtable.
 For reference of google bigtable
 My question is on RowKey, how we should be forming it.
 What google is doing is saving the URL in a reverse order as you can see in
 the PDF document com.cnn.www so that all the links associated with
 cnn.com
 will be manages in same block of GFS which will be lot easier to scan.
 I can use the same thing as google is using but wont it will be cool if I
 use
 some algorithm to compress the url

 For eg.

 RewKey   |  Google Bigtable
 |  Algorithm output
 www.cnn.com/index.php|  com.cnn.www/index.php
 |  12as/435
 www.cnn.com/news/business/index.html |
  com.cnn.www/news/business/index.html
 |  12as/2as/dcx/asd
 www.cnn.com/news/sports/index.html   |  com.cnn.www/news/sports/index.html
 |  12as/2as/eds/scf
 Reason behind doing this is rowkey will be shorter as per the Hbase design
 schema (Mentioned in topic 6.3.2.3. Rowkey Length).

 So what do I need from you guys is to know am I correct over here
 Also if I am correct what Algorithm I should using. I am using python over
 thrift as a programming language so code will be overwhelming for me...




Re: Writing map outputs to HBase

2013-08-29 Thread Ted Yu
See http://hbase.apache.org/book.html#mapreduce.example.readwrite


On Thu, Aug 29, 2013 at 7:26 AM, praveenesh kumar praveen...@gmail.comwrote:

 Hi,

 What is the easiest and efficient way to write a sequence file into HBase.
 I want to parse the sequence file. My sequence file has records in the form
 of null,bytes .

 I want to parse each value, generate keys and values in map() function and
 write the output into HBase.

 I am trying to use HBaseTableUtil class. But I am seeing
 TableMapReduceUtil.initTableReducerJob() method that is doing something
 that I need. But I guess, it requires some reducer to exist. Am I right
 here ?

 Other way is to create HBase Configuration object in mapper.setup ()
 method, do insertions in map () function and close the connections in
 mapper.close () method.

 I was wondering what is the industry standard or most efficient way to do
 map-side insertion on Hbase.

 Any suggestions would be really helpful.

 Thanks
 Praveenesh



Never ending Doing distributed log split task.,

2013-08-29 Thread Jean-Marc Spaggiari
I have restart my cluster and I'm now waiting for this task to end:

Doing distributed log split in
[hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting]

It's running fir now 30 minutes. There was nothing running on the cluster.
No reads, no writes, nothing, for days...

I got that on the logs:

2013-08-29 11:36:10,862 WARN
org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting/node1%2C60020%2C1377789460683.1377789462024
interrupted, resigning
java.io.InterruptedIOException
at
org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136)
at
org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118)
... 9 more
2013-08-29 11:36:10,950 WARN
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Interrupted while
trying to assert ownership of
/hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1377789460683-splitting%2Fnode1%252C60020%252C1377789460683.1377789462024
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:361)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.attemptToOwnTask(SplitLogWorker.java:346)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
at java.lang.Thread.run(Thread.java:722)


I'm not 100% what is causing that. I have restarted it and still getting
the same result.

Any hint?

Thanks,

JM


Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Kiru Pakkirisamy
Saurabh,
I have a suspicion that the few high latency responses are happening because of 
hot region.(s)
I vaguely remember you mentioning that the data is evenly distributed across 
all regions.
I hope your test also goes across them evenly. You may want to check the read 
requests to the regions.
 
Regards,
- kiru



 From: Saurabh Yahoo saurabh...@yahoo.com
To: user@hbase.apache.org user@hbase.apache.org 
Cc: user@hbase.apache.org user@hbase.apache.org 
Sent: Thursday, August 29, 2013 2:49 AM
Subject: Re: experiencing high latency for few reads in HBase 
 

Hi Vlad,

We do have strict latency requirement as it is financial data requiring direct 
access from clients. 

Are you saying that it is not possible to achieve sub second latency using 
hbase (because it is based on java.) ?







On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote:

 Increasing Java heap size will make latency worse, actually.
 You can't guarantee 1 sec max latency if run Java app (unless your heap size 
 is much less than 1GB). 
 I have never heard about strict maximum latency limit. Usually , its 99% , 
 99.9 or 99.99% query percentiles.
 
 You can greatly reduce your 99.xxx% percentile latency by storing you data in 
 2 replicas to two different region servers.
 Issue two read operations to those two region servers in parallel and get the 
 first response. Probability theory states that  probability 
 of two independent events (slow requests) is  the product of event's 
 probabilities themselves. 
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 4:18 PM
 To: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Kiru,
 
 Scan is not an option for our use cases.  Our read is pretty random.
 
 Any other suggestion to bring down the latency.
 
 Thanks,
 Saurabh.
 
 
 On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 
 40million row table as 400K rows and columns. We Get about 100 of the rows 
 from this 400K , do quite a bit of calculations in the coprocessor (almost a 
 group-order by) and return in this time.
 Maybe should consider replacing the MultiGets with Scan with Filter. I like 
 the FuzzyRowFilter even though you might need to match with exact key. It 
 works only with fixed length key.
 (I do have an issue right now, it is not scaling to multiple clients.)
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Saurabh Yahoo saurabh...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 3:20 PM
 Subject: Re: experiencing high latency for few reads in HBase
 
 
 Thanks Kitu. We need less than 1 sec latency.
 
 We are using both muliGet and get.
 
 We have three concurrent clients running 10 threads each. ( that makes total 
 30 concurrent clients).
 
 Thanks,
 Saurabh.
 
 On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Right 4 sec is good.
 @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this 
 a Get or a Scan ?
 BTW, in this stress test how many concurrent clients do you have ?
 
 Regards,
 - kiru
 
 
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 12:15 PM
 Subject: RE: experiencing high latency for few reads in HBase
 
 
 1. 4 sec max latency is not that bad taking into account 12GB heap.  It can 
 be much larger. What is your SLA?
 2. Block evictions is the result of a poor cache hit rate and the root 
 cause of a periodic stop-the-world GC pauses (max latencies
     latencies you have been observing in the test)
 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 
 25% - permanent). Permanent part is for CF with
 IN_MEMORY = true (you can specify this when you create CF).  Block first 
 stored in 'young gen' space, then gets promoted to 'tenured gen' space
 (or gets evicted). May be your 'perm gen' space is underutilized? This is 
 exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the 
 space allocated for block cache -
 there is no guarantee (as usual). If you don have in_memory column families 
 you may decrease
 
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 5:10 AM
 To: user@hbase.apache.org
 Subject: experiencing high latency for few reads in 

Re: Hbase thrift client's privilege control

2013-08-29 Thread Kangle Yu
sorry, I click the sent button early.


Since Thrift gateway will authenticate with HBase using the supplied
credential. No authentication will be performed by the Thrift gateway
itself. All client access via the Thrift gateway will use the Thrift
gateway's credential and have its privilege.(
http://hbase.apache.org/book/security.html)

So are there any method or patch to control the privilege of different
client of hbase thrift gateway.

thanks!

atupal


2013/8/29 Kangle Yu kangl...@hustunique.com

 Hi all,




Re: counter Increment gives DonotRetryException

2013-08-29 Thread Ted Yu
The exception came from HRegion#increment():

if(kv.getValueLength() == Bytes.SIZEOF_LONG) {

  amount += Bytes.toLong(kv.getBuffer(),
kv.getValueOffset(), Bytes.SIZEOF_LONG);

} else {

  // throw DoNotRetryIOException instead of
IllegalArgumentException

  throw new org.apache.hadoop.hbase.DoNotRetryIOException(

  Attempted to increment field that isn't 64 bits wide
);

}

Can you check the values in 'columnar:column1' ?


On Thu, Aug 29, 2013 at 4:42 AM, yeshwanth kumar yeshwant...@gmail.comwrote:

 i am newbie to Hbase,
 going through Counters topic,
 whenever i perform increment like

 incr 't1','9row27','columnar:column1',1

 it gives an

 ERROR: org.apache.hadoop.hbase.DoNotRetryIOException:
 org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field
 that isn't 64 bits wide

 looking for some help



RE: experiencing high latency for few reads in HBase

2013-08-29 Thread Vladimir Rodionov
Yes. HBase won't guarantee strict sub-second latency. 

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Saurabh Yahoo [saurabh...@yahoo.com]
Sent: Thursday, August 29, 2013 2:49 AM
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: experiencing high latency for few reads in HBase

Hi Vlad,

We do have strict latency requirement as it is financial data requiring direct 
access from clients.

Are you saying that it is not possible to achieve sub second latency using 
hbase (because it is based on java.) ?







On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote:

 Increasing Java heap size will make latency worse, actually.
 You can't guarantee 1 sec max latency if run Java app (unless your heap size 
 is much less than 1GB).
 I have never heard about strict maximum latency limit. Usually , its 99% , 
 99.9 or 99.99% query percentiles.

 You can greatly reduce your 99.xxx% percentile latency by storing you data in 
 2 replicas to two different region servers.
 Issue two read operations to those two region servers in parallel and get the 
 first response. Probability theory states that  probability
 of two independent events (slow requests) is  the product of event's 
 probabilities themselves.


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 4:18 PM
 To: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase

 Thanks Kiru,

 Scan is not an option for our use cases.  Our read is pretty random.

 Any other suggestion to bring down the latency.

 Thanks,
 Saurabh.


 On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:

 Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 
 40million row table as 400K rows and columns. We Get about 100 of the rows 
 from this 400K , do quite a bit of calculations in the coprocessor (almost a 
 group-order by) and return in this time.
 Maybe should consider replacing the MultiGets with Scan with Filter. I like 
 the FuzzyRowFilter even though you might need to match with exact key. It 
 works only with fixed length key.
 (I do have an issue right now, it is not scaling to multiple clients.)

 Regards,
 - kiru


 Kiru Pakkirisamy | webcloudtech.wordpress.com


 
 From: Saurabh Yahoo saurabh...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 3:20 PM
 Subject: Re: experiencing high latency for few reads in HBase


 Thanks Kitu. We need less than 1 sec latency.

 We are using both muliGet and get.

 We have three concurrent clients running 10 threads each. ( that makes total 
 30 concurrent clients).

 Thanks,
 Saurabh.

 On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:

 Right 4 sec is good.
 @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this 
 a Get or a Scan ?
 BTW, in this stress test how many concurrent clients do you have ?

 Regards,
 - kiru


 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 12:15 PM
 Subject: RE: experiencing high latency for few reads in HBase


 1. 4 sec max latency is not that bad taking into account 12GB heap.  It can 
 be much larger. What is your SLA?
 2. Block evictions is the result of a poor cache hit rate and the root 
 cause of a periodic stop-the-world GC pauses (max latencies
 latencies you have been observing in the test)
 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 
 25% - permanent). Permanent part is for CF with
 IN_MEMORY = true (you can specify this when you create CF).  Block first 
 stored in 'young gen' space, then gets promoted to 'tenured gen' space
 (or gets evicted). May be your 'perm gen' space is underutilized? This is 
 exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the 
 space allocated for block cache -
 there is no guarantee (as usual). If you don have in_memory column families 
 you may decrease



 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 5:10 AM
 To: user@hbase.apache.org
 Subject: experiencing high latency for few reads in HBase

 Hi,

 We are running a stress test in our 5 node cluster and we are getting the 
 expected mean latency of 10ms. But we are seeing around 20 reads out of 25 
 million reads having latency more than 4 seconds. 

Re: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable

2013-08-29 Thread Shahab Yunus
You are also using the @Override annotation to make sure that your
overridden method is being called?

Regards,
Shahab


On Thu, Aug 29, 2013 at 12:03 PM, praveenesh kumar praveen...@gmail.comwrote:

 Thanks Shahab for replying. Sorry, that was  typo, while writing the code
 snippet. Even keeping the keys as NullWritable or LongWritable i.e. by
 keeping the same types of keys, I am getting the same error.
 I don't think the error is at Map Input side. Its saying value from map.
 Can't understand where I am going wrong.

 Regards
 Praveenesh


 On Thu, Aug 29, 2013 at 4:58 PM, Shahab Yunus shahab.yu...@gmail.com
 wrote:

  
  public class MYHBaseLoader extends
  Mapper*NullWritable*,BytesWritable,NullWritable,Put {
 
   protected void map (*LongWritable* key, BytesWritable value, Context
  context) throws IOException, InterruptedException {
  ...
 
  Why is the difference in types of the keys?
 
  Regards,
  Shahab
 
 
  On Thu, Aug 29, 2013 at 5:46 AM, praveenesh kumar praveen...@gmail.com
  wrote:
 
   Hi all,
  
   I am trying to write a MR code to load a HBase table.
  
   I have a mapper that emits (null,put object) and I am using
   TableMapReduceUtil.initTableReducerJob() to write it into a HBase
 table.
  
   Following is my code snippet
  
   public class MYHBaseLoader extends
   MapperNullWritable,BytesWritable,NullWritable,Put {
  
protected void map (LongWritable key, BytesWritable value, Context
   context) throws IOException, InterruptedException {
  
 /- Some processing here.. Create put object and pushing
 it
   into Put object).
   context.write(null, put);// Pushing the put object.
  
   }
  
   public static void main (String args[]) throws IOException,
   ClassNotFoundException, InterruptedException{
   Configuration conf = new Configuration();
   Job job = new Job(conf);
   job.setJarByClass(MYHBaseLoader.class);
   job.setMapperClass(MYHBaseLoader.class);
  
  
  
 
 TableMapReduceUtil.initTableReducerJob(MY_IMPORT_TABLE_NAME,IdentityTableReducer.class,job);
   job.setMapOutputKeyClass(NullWritable.class);
   job.setMapOutputValueClass(Put.class);
   job.setInputFormatClass(SequenceFileInputFormat.class);
   //job.setNumReduceTasks(0);
  
   FileInputFormat.setInputPaths(job, new Path(test));
   Path outputPath = new Path(test_output);
   FileOutputFormat.setOutputPath(job,outputPath);
  
   //outputPath.getFileSystem(conf).delete(outputPath, true);
  
   job.waitForCompletion(true);
   System.out.println(Done);
   }
  
  
   I am getting the following error while running. Any help/guidance:
  
  
   java.io.IOException: Type mismatch in value from map: expected
   org.apache.hadoop.hbase.client.Put, recieved
   org.apache.hadoop.io.BytesWritable
   at
  
 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1023)
   at
  
 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:689)
   at
  
  
 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at
  
  
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
  
  
   Regards
   Praveenesh
  
 



Re: counter Increment gives DonotRetryException

2013-08-29 Thread Jean-Daniel Cryans
You probably put a string in there that was a number, and increment expects
a 8 bytes long. For example, if you did:

put 't1', '9row27', 'columnar:column1', '1'

Then did an increment on that, it would fail.

J-D


On Thu, Aug 29, 2013 at 4:42 AM, yeshwanth kumar yeshwant...@gmail.comwrote:

 i am newbie to Hbase,
 going through Counters topic,
 whenever i perform increment like

 incr 't1','9row27','columnar:column1',1

 it gives an

 ERROR: org.apache.hadoop.hbase.DoNotRetryIOException:
 org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field
 that isn't 64 bits wide

 looking for some help



Re: Never ending Doing distributed log split task.,

2013-08-29 Thread Ted Yu
So you have HBASE-8670 in your deployment.

Suggest upgrading hadoop to newer release, e.g. 1.2.1 so that the new HDFS
improvements can be utilized.

Cheers


On Thu, Aug 29, 2013 at 9:50 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hadoop 1.0.4 with HBase 0.94.12-SNAPSHOT

 The file name changed since I have restarted HBase but here is what I have:
 hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
 hdfs://node3:9000/hbase/.logs/node1,60020,1377793020654/
 Found 1 items
 -rw-r--r--   3 hbase supergroup  0 2013-08-29 12:17

 /hbase/.logs/node1,60020,1377793020654/node1%2C60020%2C1377793020654.1377793021892

 And I'm able to access it:
 hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -get

 /hbase/.logs/node1,60020,1377793020654/node1%2C60020%2C1377793020654.1377793021892
 .
 hadoop@node3:~/hadoop-1.0.3$

 Oh. I just checked the UI again, and it's done. Wow! Took almost 1h. HBCK
 report 0 inconsistencies detected. Status: OK

 So seems that I'm all fine.

 I don't know why it was so long. I will try to take a look at my Ganglia's
 metrics to see if I can figure anything...

 JM



 2013/8/29 Ted Yu yuzhih...@gmail.com

  What is your HBase / Hadoop version ?
 
  Can you check namenode log looking for lines related to
  hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-
  splitting/node1%2C60020%2C1377789460683.1377789462024 ?
 
  Thanks
 
 
  On Thu, Aug 29, 2013 at 9:03 AM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
   I have restart my cluster and I'm now waiting for this task to end:
  
   Doing distributed log split in
   [hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting]
  
   It's running fir now 30 minutes. There was nothing running on the
  cluster.
   No reads, no writes, nothing, for days...
  
   I got that on the logs:
  
   2013-08-29 11:36:10,862 WARN
   org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
  
  
 
 hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting/node1%2C60020%2C1377789460683.1377789462024
   interrupted, resigning
   java.io.InterruptedIOException
   at
  
  
 
 org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136)
   at
  
  
 
 org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
   at java.lang.Thread.run(Thread.java:722)
   Caused by: java.lang.InterruptedException: sleep interrupted
   at java.lang.Thread.sleep(Native Method)
   at
  
  
 
 org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118)
   ... 9 more
   2013-08-29 11:36:10,950 WARN
   org.apache.hadoop.hbase.regionserver.SplitLogWorker: Interrupted while
   trying to assert ownership of
  
  
 
 /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1377789460683-splitting%2Fnode1%252C60020%252C1377789460683.1377789462024
   java.lang.InterruptedException
   at java.lang.Object.wait(Native Method)
   at java.lang.Object.wait(Object.java:503)
   at
  org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253)
   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129)
   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
   at
  
  
 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:361)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.attemptToOwnTask(SplitLogWorker.java:346)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
   at java.lang.Thread.run(Thread.java:722)
  
  
   I'm not 100% what is causing that. I have restarted it and still
 getting
   the same result.
  
   Any hint?
  
   Thanks,
  
   JM
  
 



Re: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable

2013-08-29 Thread Shahab Yunus
Exactly I had the same though as Ashwanth too, that is why I asked whether
@Override annotation is being used or not.

Regards,
Shahab


On Thu, Aug 29, 2013 at 1:09 PM, Ashwanth Kumar 
ashwanthku...@googlemail.com wrote:

 Hey Praveenesh, I am not sure if this would help.

 But can you try moving your mapper to an inner class / separate class and
 try the code? I somehow get a feeling that default Mapper (IdentityMapper)
 is being used (may be you can check the mapreduce.map.class value?), that
 would be the only reason why your value (BytesWritable) gets emitted out in
 context.write().



 On Thu, Aug 29, 2013 at 3:16 PM, praveenesh kumar praveen...@gmail.com
 wrote:

  Hi all,
 
  I am trying to write a MR code to load a HBase table.
 
  I have a mapper that emits (null,put object) and I am using
  TableMapReduceUtil.initTableReducerJob() to write it into a HBase table.
 
  Following is my code snippet
 
  public class MYHBaseLoader extends
  MapperNullWritable,BytesWritable,NullWritable,Put {
 
   protected void map (LongWritable key, BytesWritable value, Context
  context) throws IOException, InterruptedException {
 
/- Some processing here.. Create put object and pushing it
  into Put object).
  context.write(null, put);// Pushing the put object.
 
  }
 
  public static void main (String args[]) throws IOException,
  ClassNotFoundException, InterruptedException{
  Configuration conf = new Configuration();
  Job job = new Job(conf);
  job.setJarByClass(MYHBaseLoader.class);
  job.setMapperClass(MYHBaseLoader.class);
 
 
 
 TableMapReduceUtil.initTableReducerJob(MY_IMPORT_TABLE_NAME,IdentityTableReducer.class,job);
  job.setMapOutputKeyClass(NullWritable.class);
  job.setMapOutputValueClass(Put.class);
  job.setInputFormatClass(SequenceFileInputFormat.class);
  //job.setNumReduceTasks(0);
 
  FileInputFormat.setInputPaths(job, new Path(test));
  Path outputPath = new Path(test_output);
  FileOutputFormat.setOutputPath(job,outputPath);
 
  //outputPath.getFileSystem(conf).delete(outputPath, true);
 
  job.waitForCompletion(true);
  System.out.println(Done);
  }
 
 
  I am getting the following error while running. Any help/guidance:
 
 
  java.io.IOException: Type mismatch in value from map: expected
  org.apache.hadoop.hbase.client.Put, recieved
  org.apache.hadoop.io.BytesWritable
  at
 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1023)
  at
 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:689)
  at
 
 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
  at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
 
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
  at org.apache.hadoop.mapred.Child.main(Child.java:249)
 
 
  Regards
  Praveenesh
 



 --

 Ashwanth Kumar / ashwanthkumar.in



Error running hbase

2013-08-29 Thread jamal sasha
Hi,
  I am trying to run write directly to hbase from a mapreduce code.
But I am getting this issue similar to what is reported here:
http://stackoverflow.com/questions/12607349/cant-connect-to-zookeeper-and-then-hbase-master-shuts-down

How to solve this.
I think I am running an hbase instance already setup on my cluster.
So
hbase shell works just fine??

Not sure what I am missing?
Any suggestions.
Thansk


Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Saurabh Yahoo
Thanks Vlad.  

Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region 
servers.  

Does that could be a problem? If it is, how to solve that? We already ran the 
major compaction after ingesting the data.  

Thanks,
Saurabh. 

On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote:

 Yes. HBase won't guarantee strict sub-second latency. 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Thursday, August 29, 2013 2:49 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Hi Vlad,
 
 We do have strict latency requirement as it is financial data requiring 
 direct access from clients.
 
 Are you saying that it is not possible to achieve sub second latency using 
 hbase (because it is based on java.) ?
 
 
 
 
 
 
 
 On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com 
 wrote:
 
 Increasing Java heap size will make latency worse, actually.
 You can't guarantee 1 sec max latency if run Java app (unless your heap size 
 is much less than 1GB).
 I have never heard about strict maximum latency limit. Usually , its 99% , 
 99.9 or 99.99% query percentiles.
 
 You can greatly reduce your 99.xxx% percentile latency by storing you data 
 in 2 replicas to two different region servers.
 Issue two read operations to those two region servers in parallel and get 
 the first response. Probability theory states that  probability
 of two independent events (slow requests) is  the product of event's 
 probabilities themselves.
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 4:18 PM
 To: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Kiru,
 
 Scan is not an option for our use cases.  Our read is pretty random.
 
 Any other suggestion to bring down the latency.
 
 Thanks,
 Saurabh.
 
 
 On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was 
 a 40million row table as 400K rows and columns. We Get about 100 of the 
 rows from this 400K , do quite a bit of calculations in the coprocessor 
 (almost a group-order by) and return in this time.
 Maybe should consider replacing the MultiGets with Scan with Filter. I like 
 the FuzzyRowFilter even though you might need to match with exact key. It 
 works only with fixed length key.
 (I do have an issue right now, it is not scaling to multiple clients.)
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Saurabh Yahoo saurabh...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 3:20 PM
 Subject: Re: experiencing high latency for few reads in HBase
 
 
 Thanks Kitu. We need less than 1 sec latency.
 
 We are using both muliGet and get.
 
 We have three concurrent clients running 10 threads each. ( that makes 
 total 30 concurrent clients).
 
 Thanks,
 Saurabh.
 
 On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Right 4 sec is good.
 @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this 
 a Get or a Scan ?
 BTW, in this stress test how many concurrent clients do you have ?
 
 Regards,
 - kiru
 
 
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 12:15 PM
 Subject: RE: experiencing high latency for few reads in HBase
 
 
 1. 4 sec max latency is not that bad taking into account 12GB heap.  It 
 can be much larger. What is your SLA?
 2. Block evictions is the result of a poor cache hit rate and the root 
 cause of a periodic stop-the-world GC pauses (max latencies
latencies you have been observing in the test)
 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 
 25% - permanent). Permanent part is for CF with
 IN_MEMORY = true (you can specify this when you create CF).  Block first 
 stored in 'young gen' space, then gets promoted to 'tenured gen' space
 (or gets evicted). May be your 'perm gen' space is underutilized? This is 
 exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the 
 space allocated for block cache -
 there is no guarantee (as usual). If you don have in_memory column 
 families you may decrease
 
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 

Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Saurabh Yahoo
Thanks Kiru. 

We have 10TB of data on disk. It would not fit in memory. Also for the first 
time, hbase need to read from the disk. And it has to go through the network to 
read the blocks which are stored at other data node.  

So in my opinion, locality matters.

Thanks,
Saurabh. 

On Aug 29, 2013, at 2:33 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote:

 But locality index should not matter right if you are in IN_MEMORY most and 
 you are running the test after  a few runs to make sure they are already in 
 IN_MEMORY  (ie blockCacheHit is high or blockCacheMiss is low)  (?) 
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org 
 Sent: Thursday, August 29, 2013 11:11 AM
 Subject: RE: experiencing high latency for few reads in HBase 
 
 
 Usually, either cluster restart or major compaction helps improving locality 
 index.
 There is an issue in region assignment after table disable/enable in 0.94.x 
 (x 11) which 
 breaks HDFS locality. Fixed in 0.94.11 
 
 You can write your own routine to manually localize particular table using 
 public HBase Client API.
 
 But this won't help you to stay withing 1 sec anyway. 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Thursday, August 29, 2013 10:52 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Vlad.
 
 Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region 
 servers.
 
 Does that could be a problem? If it is, how to solve that? We already ran the 
 major compaction after ingesting the data.
 
 Thanks,
 Saurabh.
 
 On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com 
 wrote:
 
 Yes. HBase won't guarantee strict sub-second latency.
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Thursday, August 29, 2013 2:49 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Hi Vlad,
 
 We do have strict latency requirement as it is financial data requiring 
 direct access from clients.
 
 Are you saying that it is not possible to achieve sub second latency using 
 hbase (because it is based on java.) ?
 
 
 
 
 
 
 
 On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com 
 wrote:
 
 Increasing Java heap size will make latency worse, actually.
 You can't guarantee 1 sec max latency if run Java app (unless your heap 
 size is much less than 1GB).
 I have never heard about strict maximum latency limit. Usually , its 99% , 
 99.9 or 99.99% query percentiles.
 
 You can greatly reduce your 99.xxx% percentile latency by storing you data 
 in 2 replicas to two different region servers.
 Issue two read operations to those two region servers in parallel and get 
 the first response. Probability theory states that  probability
 of two independent events (slow requests) is  the product of event's 
 probabilities themselves.
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 4:18 PM
 To: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Kiru,
 
 Scan is not an option for our use cases.  Our read is pretty random.
 
 Any other suggestion to bring down the latency.
 
 Thanks,
 Saurabh.
 
 
 On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was 
 a 40million row table as 400K rows and columns. We Get about 100 of the 
 rows from this 400K , do quite a bit of calculations in the coprocessor 
 (almost a group-order by) and return in this time.
 Maybe should consider replacing the MultiGets with Scan with Filter. I 
 like the FuzzyRowFilter even though you might need to match with exact 
 key. It works only with fixed length key.
 (I do have an issue right now, it is not scaling to multiple clients.)
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Saurabh Yahoo saurabh...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 3:20 PM
 Subject: Re: experiencing high latency for few reads in HBase
 
 
 Thanks Kitu. We need less than 1 sec latency.
 
 We are using both muliGet and get.
 
 

Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Kiru Pakkirisamy
Yes, in that case, it matters. I was talking about a case where you are mostly 
serving from cache.
 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com



 From: Saurabh Yahoo saurabh...@yahoo.com
To: user@hbase.apache.org user@hbase.apache.org 
Cc: user@hbase.apache.org user@hbase.apache.org 
Sent: Thursday, August 29, 2013 12:09 PM
Subject: Re: experiencing high latency for few reads in HBase 
 

Thanks Kiru. 

We have 10TB of data on disk. It would not fit in memory. Also for the first 
time, hbase need to read from the disk. And it has to go through the network to 
read the blocks which are stored at other data node.  

So in my opinion, locality matters.

Thanks,
Saurabh. 

On Aug 29, 2013, at 2:33 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote:

 But locality index should not matter right if you are in IN_MEMORY most and 
 you are running the test after  a few runs to make sure they are already in 
 IN_MEMORY  (ie blockCacheHit is high or blockCacheMiss is low)  (?) 
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org 
 Sent: Thursday, August 29, 2013 11:11 AM
 Subject: RE: experiencing high latency for few reads in HBase 
 
 
 Usually, either cluster restart or major compaction helps improving locality 
 index.
 There is an issue in region assignment after table disable/enable in 0.94.x 
 (x 11) which 
 breaks HDFS locality. Fixed in 0.94.11 
 
 You can write your own routine to manually localize particular table using 
 public HBase Client API.
 
 But this won't help you to stay withing 1 sec anyway. 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Thursday, August 29, 2013 10:52 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Vlad.
 
 Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region 
 servers.
 
 Does that could be a problem? If it is, how to solve that? We already ran the 
 major compaction after ingesting the data.
 
 Thanks,
 Saurabh.
 
 On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com 
 wrote:
 
 Yes. HBase won't guarantee strict sub-second latency.
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Thursday, August 29, 2013 2:49 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Hi Vlad,
 
 We do have strict latency requirement as it is financial data requiring 
 direct access from clients.
 
 Are you saying that it is not possible to achieve sub second latency using 
 hbase (because it is based on java.) ?
 
 
 
 
 
 
 
 On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com 
 wrote:
 
 Increasing Java heap size will make latency worse, actually.
 You can't guarantee 1 sec max latency if run Java app (unless your heap 
 size is much less than 1GB).
 I have never heard about strict maximum latency limit. Usually , its 99% , 
 99.9 or 99.99% query percentiles.
 
 You can greatly reduce your 99.xxx% percentile latency by storing you data 
 in 2 replicas to two different region servers.
 Issue two read operations to those two region servers in parallel and get 
 the first response. Probability theory states that  probability
 of two independent events (slow requests) is  the product of event's 
 probabilities themselves.
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 4:18 PM
 To: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Kiru,
 
 Scan is not an option for our use cases.  Our read is pretty random.
 
 Any other suggestion to bring down the latency.
 
 Thanks,
 Saurabh.
 
 
 On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com 
 wrote:
 
 Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was 
 a 40million row table as 400K rows and columns. We Get about 100 of the 
 rows from this 400K , do quite a bit of calculations in the coprocessor 
 (almost a group-order by) and return in this time.
 Maybe should consider replacing the MultiGets with Scan with Filter. I 
 like the FuzzyRowFilter even though you might need to match with exact 
 key. It works only with fixed length key.
 (I do have an issue right now, it is not scaling to multiple 

Default balancer status

2013-08-29 Thread Jean-Marc Spaggiari
Hi,

Is there a way to have the balancer off by default? We can turn it off
using balancer_switch but when we restart the cluster, it's back to on. Any
way to turn it off by default?

Thanks,

JM


Re: Error running hbase

2013-08-29 Thread Ted Yu
There was an answer at the end of stackflow URL you posted.

If your problem isn't solved, please let us know some more details of your
deployment: HBase version, config parameters, etc.

Thanks


On Thu, Aug 29, 2013 at 10:49 AM, jamal sasha jamalsha...@gmail.com wrote:

 Hi,
   I am trying to run write directly to hbase from a mapreduce code.
 But I am getting this issue similar to what is reported here:

 http://stackoverflow.com/questions/12607349/cant-connect-to-zookeeper-and-then-hbase-master-shuts-down

 How to solve this.
 I think I am running an hbase instance already setup on my cluster.
 So
 hbase shell works just fine??

 Not sure what I am missing?
 Any suggestions.
 Thansk



Re: Default balancer status

2013-08-29 Thread Bryan Beaudreault
This was fixed in 0.95.2.  https://issues.apache.org/jira/browse/HBASE-6260

In the meantime you can set the hbase.balancer.period to a very large
number.


On Thu, Aug 29, 2013 at 3:32 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi,

 Is there a way to have the balancer off by default? We can turn it off
 using balancer_switch but when we restart the cluster, it's back to on. Any
 way to turn it off by default?

 Thanks,

 JM



Re: Default balancer status

2013-08-29 Thread Jean-Marc Spaggiari
Thanks Bryan. That's what I was looking for. If I have time I will see if I
can backport that into 0.94. For now I will go with the period option...

JM


2013/8/29 Bryan Beaudreault bbeaudrea...@hubspot.com

 This was fixed in 0.95.2.
 https://issues.apache.org/jira/browse/HBASE-6260

 In the meantime you can set the hbase.balancer.period to a very large
 number.


 On Thu, Aug 29, 2013 at 3:32 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi,
 
  Is there a way to have the balancer off by default? We can turn it off
  using balancer_switch but when we restart the cluster, it's back to on.
 Any
  way to turn it off by default?
 
  Thanks,
 
  JM
 



Re: Region server exception

2013-08-29 Thread Ted Yu
This exception means some other thread was holding the lock for extended
period of time.

Can you tell us more about your coprocessor ?

Thanks


On Thu, Aug 29, 2013 at 12:55 PM, Kiru Pakkirisamy 
kirupakkiris...@yahoo.com wrote:



 This exception stack happens from within my coprocessor code on concurrent
 reads. Any ideas ?

 java.io.InterruptedIOException
 at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5894)
 at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5875)
 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRe
 gion.java:5803)
 at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H
 Region.java:3852)
 at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H
 Region.java:3896)




 Regards,
 - kiru


 Kiru Pakkirisamy | webcloudtech.wordpress.com


Re: Region server exception

2013-08-29 Thread Kiru Pakkirisamy
Ted,
When there are more than 32 concurrent clients (in a 4 nodes x 8 core cluster). 
I keep getting responseTooSlow for my coprocessors.
Our app is built mainly using coprocessor and a few multi-get.

(responseTooSlow): 
{processingtimems:10682,call:execCoprocessor([B@511c627c, 
getFoo({T_5208=0.004815409309791332,

.(multiple values of T_id=double value)
20), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, 
client version=29, methodsFingerPrint
=-1368823753,client:10.149.5.56:38292,starttimems:1377808493508,queuetimems:7,class:HRegionServer,responsesize:0,method:execCoprocessor}

We do a orderby on the T_number and do calculations on the double. This 
finishes in 400 msec (total T_ values processed is around 600K)  when there is 
only one client. But takes 8000 or 1 when the # of concurrent connections 
are increased to 32 or above.
 
Regards,
- kiru



 From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org; Kiru Pakkirisamy 
kirupakkiris...@yahoo.com 
Sent: Thursday, August 29, 2013 1:17 PM
Subject: Re: Region server exception
 

This exception means some other thread was holding the lock for extended
period of time.

Can you tell us more about your coprocessor ?

Thanks


On Thu, Aug 29, 2013 at 12:55 PM, Kiru Pakkirisamy 
kirupakkiris...@yahoo.com wrote:



 This exception stack happens from within my coprocessor code on concurrent
 reads. Any ideas ?

 java.io.InterruptedIOException
 at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5894)
 at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5875)
 at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRe
 gion.java:5803)
 at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H
 Region.java:3852)
 at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H
 Region.java:3896)




 Regards,
 - kiru


 Kiru Pakkirisamy | webcloudtech.wordpress.com

Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Saurabh Yahoo
Thanks Adrian.

Based on hbase book, it is listed as experimental item. ( 
http://hbase.apache.org/book/upgrade0.92.html), even it had been implemented 
back in 2011.

Is anyone running this in production? Any feedback..

Thanks,
Saurabh. 

On Aug 29, 2013, at 4:07 PM, Adrien Mogenet adrien.moge...@gmail.com wrote:

 Another point that could help to stay under the `1s SLA': enable direct
 byte buffers for LruBlockCache. Have a look at HBASE-4027.
 
 
 On Thu, Aug 29, 2013 at 9:27 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com
 wrote:
 
 Yes, in that case, it matters. I was talking about a case where you are
 mostly serving from cache.
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Saurabh Yahoo saurabh...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Thursday, August 29, 2013 12:09 PM
 Subject: Re: experiencing high latency for few reads in HBase
 
 
 Thanks Kiru.
 
 We have 10TB of data on disk. It would not fit in memory. Also for the
 first time, hbase need to read from the disk. And it has to go through the
 network to read the blocks which are stored at other data node.
 
 So in my opinion, locality matters.
 
 Thanks,
 Saurabh.
 
 On Aug 29, 2013, at 2:33 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com
 wrote:
 
 But locality index should not matter right if you are in IN_MEMORY most
 and you are running the test after  a few runs to make sure they are
 already in IN_MEMORY  (ie blockCacheHit is high or blockCacheMiss is low)
 (?)
 
 Regards,
 - kiru
 
 
 Kiru Pakkirisamy | webcloudtech.wordpress.com
 
 
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Thursday, August 29, 2013 11:11 AM
 Subject: RE: experiencing high latency for few reads in HBase
 
 
 Usually, either cluster restart or major compaction helps improving
 locality index.
 There is an issue in region assignment after table disable/enable in
 0.94.x (x 11) which
 breaks HDFS locality. Fixed in 0.94.11
 
 You can write your own routine to manually localize particular table
 using public HBase Client API.
 
 But this won't help you to stay withing 1 sec anyway.
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Thursday, August 29, 2013 10:52 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Vlad.
 
 Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all
 region servers.
 
 Does that could be a problem? If it is, how to solve that? We already
 ran the major compaction after ingesting the data.
 
 Thanks,
 Saurabh.
 
 On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:
 
 Yes. HBase won't guarantee strict sub-second latency.
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Thursday, August 29, 2013 2:49 AM
 To: user@hbase.apache.org
 Cc: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Hi Vlad,
 
 We do have strict latency requirement as it is financial data requiring
 direct access from clients.
 
 Are you saying that it is not possible to achieve sub second latency
 using hbase (because it is based on java.) ?
 
 
 
 
 
 
 
 On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:
 
 Increasing Java heap size will make latency worse, actually.
 You can't guarantee 1 sec max latency if run Java app (unless your
 heap size is much less than 1GB).
 I have never heard about strict maximum latency limit. Usually , its
 99% , 99.9 or 99.99% query percentiles.
 
 You can greatly reduce your 99.xxx% percentile latency by storing you
 data in 2 replicas to two different region servers.
 Issue two read operations to those two region servers in parallel and
 get the first response. Probability theory states that  probability
 of two independent events (slow requests) is  the product of event's
 probabilities themselves.
 
 
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com
 
 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 4:18 PM
 To: user@hbase.apache.org
 Subject: Re: experiencing high latency for few reads in HBase
 
 Thanks Kiru,
 
 Scan is not an option for our use cases.  Our read is pretty random.
 
 Any other suggestion to bring down the latency.
 
 Thanks,
 Saurabh.
 
 
 On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy 
 kirupakkiris...@yahoo.com 

Re: observation while running hbase under load

2013-08-29 Thread RK S
Does Hbase gives higher Preference to Writes than Reads , if  one tries to
do both operation for the same rowkey at the same time???
My scenario

Iam new to Hbase and Iam testing Hbase for our datawarehouse solution. Iam
trying following 2 scenarios.


 10 Rows
 Each of the Rowkey has 5000 Columns Qualifiers spread across 3 Column
 families.

 I generate following 2 kinds of load.

 1.
   1.1  Generate 10 of rows , with sequential INSERT. By sequential
 INSERT I mean
  Each time I do a insert of rowkey  also insert all  of the 5000
 Column qualifiers . each insert also does some READS as some of the column
 families act like Index.
   1.2  After Generation of Table using above , I perform READ , SCAN and
 INSERT randomly column qualifier in random fashion.

 2.
Doing both Generation of Load (1.1 ) and doing read ,scan and insert
 random column qualifier (1.2) in parallel

 Observed Behavior.

 While 2 is happening I can see that read ,scan take more than what they
 use to take in 1 . This is fine as when  insert is happening read is
 blocked as whole row is locked. But I do not see any significant difference
 in performance of insert or update . I thought even insert should have been
 blocked while read or scan is happening on the same rowkey , a lock will be
 held for a given ROWKEY.
 Please remember READ,SCAN and INSERT happen on the same Rowkeys.

 Question: Does hbase give preference to write than read or am I missing
 something ?

 regards,
 rks



RE: HBase client with security

2013-08-29 Thread Lanati, Matteo
Hi Harsh,

thanks for the suggestion.
I added HADOOP_PREFIX so that the conf folder is in the path.
It still doesn't work, so I suppose Hadoop's core-site.xml is faulty (though I 
need a Kerberos ticket to use Hadoop, so security is working).
In fact, when I try to list from HBase shell I get

13/08/29 23:47:43 ERROR security.UserGroupInformation: 
PriviledgedActionException as:lu95...@hadoop.lrz.de cause:java.io.IOException: 
Failed to specify server's Kerberos principal name
13/08/29 23:47:43 INFO security.UserGroupInformation: Initiating logout for 
lu95...@hadoop.lrz.de
13/08/29 23:47:43 INFO security.UserGroupInformation: Initiating re-login for 
lu95...@hadoop.lrz.de


The file core-site.xml contains the following

namefs.default.name/name
valuehdfs://10.156.120.41:9000/value
  /property

  property
namehadoop.security.authentication/name
valuekerberos/value
  /property

  property
namehadoop.security.authorization/name
valuetrue/value
  /property

  property
namehadoop.kerberos.kinit.command/name
value/usr/bin/kinit/value
  /property

What else should I need? Maybe a reference to the keytab contained in  
hbase/conf/zk-jaas.conf?

Bye,

Matteo


Matteo Lanati
Distributed Resources Group
Leibniz-Rechenzentrum (LRZ)
Boltzmannstrasse 1
85748 Garching b. München (Germany)
Phone: +49 89 35831 8724


From: Harsh J [ha...@cloudera.com]
Sent: 29 August 2013 15:53
To: user@hbase.apache.org
Subject: Re: HBase client with security

Two things come to mind:

1. Is HADOOP_CONF_DIR also on HBase's classpath? If it or
HADOOP_PREFIX/HADOOP_HOME is defined, it usually is. But re-check via
hbase classpath
2. Assuming (1) is good, does your core-site.xml have kerberos
authentication settings for hadoop as well?

On Thu, Aug 29, 2013 at 6:58 PM, Lanati, Matteo matteo.lan...@lrz.de wrote:
 Hi all,

 I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with 
 security.
 HBase works if I launch the shell from the node running the master, but I'd 
 like to use it from an external machine.
 I prepared one, copying the Hadoop and HBase installation folders and 
 adapting the path (indeed I can use the same client to run MR jobs and 
 interact with HDFS).
 Regarding HBase client configuration:

 - hbase-site.xml specifies

  property
namehbase.security.authentication/name
valuekerberos/value
  /property
  property
namehbase.rpc.engine/name
valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value
  /property
  property
namehbase.zookeeper.quorum/name
valuemaster.hadoop.local,host49.hadoop.local/value
  /property

 where the zookeeper hosts are reachable and can be solved via DNS. I had to 
 specify them otherwise the shell complains about 
 org.apache.zookeeper.KeeperException$ConnectionLossException: 
 KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

 - I have a keytab for the principal I want to use (user running hbase/my 
 client hostname@MYREALM), correctly addressed by the file 
 hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to 
 zk-jaas.conf.

 Nonetheless, when I issue a command from a HBase shell on the client machine, 
 I got an error in the HBase master log

 2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 listener on 6: readAndProcess threw exception 
 org.apache.hadoop.security.AccessControlException: Authentication is 
 required. Count of bytes read: 0
 org.apache.hadoop.security.AccessControlException: Authentication is required
 at 
 org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)

 It looks like there's a mismatch between the client and the master regarding 
 the authentication mechanism. Note that from the same client machine I can 
 launch and use a Zookeeper shell.
 What am I missing in the client configuration? Does /etc/krb5.conf play any 
 role into this?
 Thanks,

 Matteo


 Matteo Lanati
 Distributed Resources Group
 Leibniz-Rechenzentrum (LRZ)
 Boltzmannstrasse 1
 85748   Garching b. München (Germany)
 Phone: +49 89 35831 8724





--
Harsh J

Re: observation while running hbase under load

2013-08-29 Thread Ted Yu
This JIRA is related: HBASE-8836


On Thu, Aug 29, 2013 at 2:22 PM, RK S dhurandarg...@gmail.com wrote:

 Does Hbase gives higher Preference to Writes than Reads , if  one tries to
 do both operation for the same rowkey at the same time???
 My scenario

 Iam new to Hbase and Iam testing Hbase for our datawarehouse solution. Iam
 trying following 2 scenarios.

 
  10 Rows
  Each of the Rowkey has 5000 Columns Qualifiers spread across 3 Column
  families.
 
  I generate following 2 kinds of load.
 
  1.
1.1  Generate 10 of rows , with sequential INSERT. By sequential
  INSERT I mean
   Each time I do a insert of rowkey  also insert all  of the 5000
  Column qualifiers . each insert also does some READS as some of the
 column
  families act like Index.
1.2  After Generation of Table using above , I perform READ , SCAN and
  INSERT randomly column qualifier in random fashion.
 
  2.
 Doing both Generation of Load (1.1 ) and doing read ,scan and insert
  random column qualifier (1.2) in parallel
 
  Observed Behavior.
 
  While 2 is happening I can see that read ,scan take more than what they
  use to take in 1 . This is fine as when  insert is happening read is
  blocked as whole row is locked. But I do not see any significant
 difference
  in performance of insert or update . I thought even insert should have
 been
  blocked while read or scan is happening on the same rowkey , a lock will
 be
  held for a given ROWKEY.
  Please remember READ,SCAN and INSERT happen on the same Rowkeys.
 
  Question: Does hbase give preference to write than read or am I missing
  something ?
 
  regards,
  rks
 



Re: experiencing high latency for few reads in HBase

2013-08-29 Thread Kiru Pakkirisamy
I just moved from 0.94.10 to 0.94.11. Tremendous improvement in our app's query 
response. Went down to 1.3 sec from 1.7 sec. 
Concurrent tests are also good, but it still exponentially degrades from to 10 
secs for 8 concurrent clients. There might a bug lurking in there somewhere 
that is probably affecting us.

 
Regards,
- kiru


 From: Federico Gaule fga...@despegar.com
To: user@hbase.apache.org 
Sent: Thursday, August 29, 2013 5:37 AM
Subject: Re: experiencing high latency for few reads in HBase
 

In 0.94.11 Release, has been included an optimization for MultiGets: 
https://issues.apache.org/jira/browse/HBASE-9087

What version have you deployed?


On 08/29/2013 01:29 AM, lars hofhansl wrote:
 A 1s SLA is tough in HBase (or any large memory JVM application).


 Maybe, if you presplit your table, play with JDK7 and the G1 collector, but 
 nobody here will vouch for such an SLA in the 99th percentile.
 I heard some folks have experimented with 30GB heaps and G1 and have reported 
 max GC times of 200ms, but I have not verified that.

 -- Lars



 - Original Message -
 From: Saurabh Yahoo saurabh...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 28, 2013 3:17 PM
 Subject: Re: experiencing high latency for few reads in HBase

 Hi Vlad,

 Thanks for your response.

 1. Our SLA is less than one sec. we cannot afford latency more than 1 sec.

 We can increase heap size if that help, we have enough memory on server. What 
 would be the optimal heap size?

 2. Cache hit ratio is 95%.  One thing I don't understand that we have 
 allocated only 4gb for block cache out of 12gb. That left 8gb for rest of 
 JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to 
 process the requests? What are the most memory consuming objects in region 
 server?

 3. We will change the cf to IN_memory and report back performance difference.

 Thanks,
 Saurabh.

 On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov vrodio...@carrieriq.com 
 wrote:

 1. 4 sec max latency is not that bad taking into account 12GB heap.  It can 
 be much larger. What is your SLA?
 2. Block evictions is the result of a poor cache hit rate and the root cause 
 of a periodic stop-the-world GC pauses (max latencies
      latencies you have been observing in the test)
 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% 
 - permanent). Permanent part is for CF with
 IN_MEMORY = true (you can specify this when you create CF).  Block first 
 stored in 'young gen' space, then gets promoted to 'tenured gen' space
 (or gets evicted). May be your 'perm gen' space is underutilized? This is 
 exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the 
 space allocated for block cache -
 there is no guarantee (as usual). If you don have in_memory column families 
 you may decrease



 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Saurabh Yahoo [saurabh...@yahoo.com]
 Sent: Wednesday, August 28, 2013 5:10 AM
 To: user@hbase.apache.org
 Subject: experiencing high latency for few reads in HBase

 Hi,

 We are running a stress test in our 5 node cluster and we are getting the 
 expected mean latency of 10ms. But we are seeing around 20 reads out of 25 
 million reads having latency more than 4 seconds. Can anyone provide the 
 insight what we can do to meet below second SLA for each and every read?

 We observe the following things -

 1. Reads are evenly distributed among 5 nodes.  CPUs remain under 5% 
 utilized.

 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block 
 cache got filled up but around 1gb remained free. There are a large number 
 of cache eviction.

 Questions to experts -

 1. If there are still 1gb of free block cache available, why is hbase 
 evicting the block from cache?

 4. We are seeing memory went up to 10gb three times before dropping sharply 
 to 5gb.

 Any help is highly appreciable,

 Thanks,
 Saurabh.

 Confidentiality Notice:  The information contained in this message, 
 including any attachments hereto, may be confidential and is intended to be 
 read only by the individual or entity to whom this message is addressed. If 
 the reader of this message is not the intended recipient or an agent or 
 designee of the intended recipient, please note that any review, use, 
 disclosure or distribution of this message or its attachments, in any form, 
 is strictly prohibited.  If you have received this message in error, please 
 immediately notify the sender and/or notificati...@carrieriq.com and delete 
 or destroy any copy of this message and its attachments.