Re: Getting ScannerTimeoutException even after several calls in the specified time limit

2012-09-11 Thread HARI KUMAR
Hi,

Are u trying to do parallel scans. If yes, check the time taken for GC and
the number of calls that can be served at your end point.

Best Regards
N.Hari Kumar

On Tue, Sep 11, 2012 at 8:22 AM, Dhirendra Singh dps...@gmail.com wrote:

 i tried with a smaller caching i.e 10, it failed again, not its not really
 a big cell. this small cluster(4 nodes) is only used for Hbase, i am
 currently using hbase-0.92.1-cdh4.0.1. ,  could you just let me know how
 could i debug this issue ?


 aused by: org.apache.hadoop.hbase.client.ScannerTimeoutException:
 99560ms passed since the last invocation, timeout is currently set to
 6
 at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1302)
 at
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1399)
 ... 5 more
 Caused by: org.apache.hadoop.hbase.UnknownScannerException:
 org.apache.hadoop.hbase.UnknownScannerException: Name:
 -8889369042827960647
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2114)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)



 On Mon, Sep 10, 2012 at 10:53 PM, Stack st...@duboce.net wrote:

  On Mon, Sep 10, 2012 at 10:13 AM, Dhirendra Singh dps...@gmail.com
  wrote:
   I am facing this exception while iterating over a big table,  by
 default
  i
   have specified caching as 100,
  
   i am getting the below exception, even though i checked there are
 several
   calls made to the scanner before it threw this exception, but somehow
 its
   saying 86095ms were passed since last invocation.
  
   i also observed that if it set scan.setCaching(false),  it succeeds,
   could
   some one please explain or point me to some document as if what's
  happening
   here and what's the best practices to avoid it.
  
  
 
  Try again cachine  100.  See if it works.  A big cell?  A GC pause?
  You should be able to tell roughly which server is being traversed
  when you get the timeout.  Anything else going on on that server at
  the time?  What version of HBase?
  St.Ack
 



 --
 Warm Regards,
 Dhirendra Pratap
 +91. 9717394713




-- 
FROM
HARI KUMAR.N


HDFS footprint of a table

2012-09-11 Thread Lin Ma
Hi guys,

Supposing I have a table in HBase, how to estimate its storage footprint?
Thanks.

regards,
Lin


Regarding column family

2012-09-11 Thread Ramasubramanian
Hi,

Does column family play any role during loading a file into hbase from hdfs in 
terms of performance?

Regards,
Rams

Re: 答复: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-11 Thread John Hancock
Huaxiang,

You are looking for hbase-default.xml, right?

The output of the find command is telling you where it is.

/usr/share/doc/hbase-0.92.1+67/hbase-default.xml

You should be able to enter the commands:

less /usr/share/doc/hbase-0.92.1+67/hbase-default.xml

or

vi /usr/share/doc/hbase-0.92.1+67/hbase-default.xml

or

emacs /usr/share/doc/hbase-0.92.1+67/hbase-default.xml

or

nano  /usr/share/doc/hbase-0.92.1+67/hbase-default.xml

and see the contents of hbase-default.xml

Or just do

cd /usr/share/doc/hbase-0.92.1+67/

and then

ls -la

and you ought to see hbase-default there in the output of the ls command.

-John

On Mon, Sep 10, 2012 at 12:02 PM, huaxiang huaxi...@asiainfo-linkage.comwrote:

 Hi,
I don't find the hbase-default.xml file using following command, any
 other way?
To be clear, this hadoop was installed with CDH RPM package.

 Huaxiang

 [root@hadoop1 ~]# clear
 [root@hadoop1 ~]# rpm -qlp *rpm_file_name.rpm*
 [root@hadoop1 ~]# ^C
 [root@hadoop1 ~]# find / -name *hbase-default.xml*
 /usr/share/doc/hbase-0.92.1+67/hbase-default.xml
 [root@hadoop1 ~]#

 -邮件原件-
 发件人: Monish r [mailto:monishs...@gmail.com]
 发送时间: 2012年9月10日 15:00
 收件人: user@hbase.apache.org
 主题: Re: for CDH4.0, where can i find the hbase-default.xml file if using
 RPM install

 Hi,
 Try

 rpm -qlp *rpm_file_name.rpm*

 This will list all files in the rpm , from this u can know where
 hbase-default.xml is.


 On Sat, Sep 8, 2012 at 3:16 PM, John Hancock jhancock1...@gmail.com
 wrote:

  Huaxiang,
 
  This may not be the quickest way to find it, but if it's anywhere in
  your system, this command will find it:
 
  find / -name *hbase-default.xml*
 
  or
 
  cd / find / -name *hbase-default.xml*  temp.txt
 
  will save the output of the find command to a text file leaving out
  any error messages that might be distracting.
 
 
  -John
 
 
 
  On Sat, Sep 8, 2012 at 12:47 AM, huaxiang
  huaxi...@asiainfo-linkage.com
  wrote:
 
   Hi,
  
   I install CDH4.0 with RPM package, but I cannot find the
  hbase-default.xml
   file?
  
   Where can I find it?
  
  
  
   Best R.
  
  
  
   Huaxiang
  
  
 




Re: Getting ScannerTimeoutException even after several calls in the specified time limit

2012-09-11 Thread HARI KUMAR
For GC Monitoring, Add Parameters export HBASE_OPTS=$HBASE_OPTS
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:$HBASE_HOME/logs/gc-hbase.log to hbase-env.sh and try to view the
file using tools like GCViewer.  or use tools like VisualVM to look at
your GC Consumption.

./hari

Add

On Tue, Sep 11, 2012 at 2:11 PM, Dhirendra Singh dps...@gmail.com wrote:

 No i am not doing parallel scans,

 * If yes, check the time taken for GC and
 the number of calls that can be served at your end point*.

  could you please tell me how to do that, where can i see the GC logs?


 On Tue, Sep 11, 2012 at 12:54 PM, HARI KUMAR harikum2...@gmail.comwrote:

 Hi,

 Are u trying to do parallel scans. If yes, check the time taken for GC and
 the number of calls that can be served at your end point.

 Best Regards
 N.Hari Kumar

 On Tue, Sep 11, 2012 at 8:22 AM, Dhirendra Singh dps...@gmail.com
 wrote:

  i tried with a smaller caching i.e 10, it failed again, not its not
 really
  a big cell. this small cluster(4 nodes) is only used for Hbase, i am
  currently using hbase-0.92.1-cdh4.0.1. ,  could you just let me know how
  could i debug this issue ?
 
 
  aused by: org.apache.hadoop.hbase.client.ScannerTimeoutException:
  99560ms passed since the last invocation, timeout is currently set to
  6
  at
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1302)
  at
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1399)
  ... 5 more
  Caused by: org.apache.hadoop.hbase.UnknownScannerException:
  org.apache.hadoop.hbase.UnknownScannerException: Name:
  -8889369042827960647
  at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2114)
  at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
 
 
 
  On Mon, Sep 10, 2012 at 10:53 PM, Stack st...@duboce.net wrote:
 
   On Mon, Sep 10, 2012 at 10:13 AM, Dhirendra Singh dps...@gmail.com
   wrote:
I am facing this exception while iterating over a big table,  by
  default
   i
have specified caching as 100,
   
i am getting the below exception, even though i checked there are
  several
calls made to the scanner before it threw this exception, but
 somehow
  its
saying 86095ms were passed since last invocation.
   
i also observed that if it set scan.setCaching(false),  it succeeds,
could
some one please explain or point me to some document as if what's
   happening
here and what's the best practices to avoid it.
   
   
  
   Try again cachine  100.  See if it works.  A big cell?  A GC pause?
   You should be able to tell roughly which server is being traversed
   when you get the timeout.  Anything else going on on that server at
   the time?  What version of HBase?
   St.Ack
  
 
 
 
  --
  Warm Regards,
  Dhirendra Pratap
  +91. 9717394713
 



 --
 FROM
 HARI KUMAR.N




 --
 Warm Regards,
 Dhirendra Pratap
 +91. 9717394713






-- 
FROM
HARI KUMAR.N


Re: More rows or less rows and more columns

2012-09-11 Thread Michel Segel
Option c, depending on the use case, add a structure to you columns to store 
the data.
You may want to update this section


Sent from a remote device. Please excuse any typos...

Mike Segel

On Sep 10, 2012, at 12:30 PM, Harsh J ha...@cloudera.com wrote:

 Hey Mohit,
 
 See http://hbase.apache.org/book.html#schema.smackdown.rowscols
 
 On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia mohitanch...@gmail.com 
 wrote:
 Is there any recommendation on how many columns one should have per row. My
 columns are  200 bytes. This will help me to decide if I should shard my
 rows with id + some date/time value.
 
 
 
 -- 
 Harsh J
 


Any group like this for pentaho?

2012-09-11 Thread Ramasubramanian
Hi,

Like this hbase group, do we have such group for pentaho too? If so pls share 
the mail drop. 

Regards,
Rams

Re: Any group like this for pentaho?

2012-09-11 Thread Nitin Pawar
not sure about pentaho group but there is active pentaho irc channel
which runs 24x7 mostly

#pentaho is the irc room

On Tue, Sep 11, 2012 at 5:15 PM, Ramasubramanian
ramasubramanian.naraya...@gmail.com wrote:
 Hi,

 Like this hbase group, do we have such group for pentaho too? If so pls share 
 the mail drop.

 Regards,
 Rams



-- 
Nitin Pawar


Re: HDFS footprint of a table

2012-09-11 Thread Doug Meil

Hi there, see...

http://hbase.apache.org/book.html#regions.arch

Š And in particular focus onŠ

9.7.5.4. KeyValue






On 9/11/12 3:35 AM, Lin Ma lin...@gmail.com wrote:

Hi guys,

Supposing I have a table in HBase, how to estimate its storage footprint?
Thanks.

regards,
Lin




Re: Regarding column family

2012-09-11 Thread Doug Meil

Hi there, additionally, see..

http://hbase.apache.org/book.html#regions.arch

Š and focus on 9.7.5.4. KeyValue because the CF name is actually a part
of each KV.  




On 9/11/12 4:03 AM, n keywal nkey...@gmail.com wrote:

Yes, because there is one store (hence set of files) per column family.
See this: http://hbase.apache.org/book.html#number.of.cfs

On Tue, Sep 11, 2012 at 9:52 AM, Ramasubramanian 
ramasubramanian.naraya...@gmail.com wrote:

 Hi,

 Does column family play any role during loading a file into hbase from
 hdfs in terms of performance?

 Regards,
 Rams




Re: More rows or less rows and more columns

2012-09-11 Thread Doug Meil

re:  You may want to update this section

Good point.  I will add.





On 9/11/12 6:59 AM, Michel Segel michael_se...@hotmail.com wrote:

Option c, depending on the use case, add a structure to you columns to
store the data.
You may want to update this section


Sent from a remote device. Please excuse any typos...

Mike Segel

On Sep 10, 2012, at 12:30 PM, Harsh J ha...@cloudera.com wrote:

 Hey Mohit,
 
 See http://hbase.apache.org/book.html#schema.smackdown.rowscols
 
 On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia
mohitanch...@gmail.com wrote:
 Is there any recommendation on how many columns one should have per
row. My
 columns are  200 bytes. This will help me to decide if I should shard
my
 rows with id + some date/time value.
 
 
 
 -- 
 Harsh J
 





Performance: hive+hbase integration query against the row_key

2012-09-11 Thread Shengjie Min
Hi,

I am trying to get hive working on top of my hbase table following the
guide below:
https://cwiki.apache.org/Hive/hbaseintegration.html

CREATE EXTERNAL TABLE hive_hbase_test (key string, a string, b string, c
string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
(hbase.columns.mapping=:key,cf:a,cf:b,cf:c) TBLPROPERTIES (
hbase.table.name=test);

this hive table creation makes my mapping roughly look like this:

hive_hbase_test  VS   test
Hive key  -   hbase row_key
Hive column a -  hbase cf:a
Hive column b  -  hbase cf:b
Hive column c  -  hbase cf:c

From my understanding on how HBaseStorageHandler works, it's supposed to
take advantage of the hbase row_key index as much as possible. So I would
expect,

1. if you do a hive query against the row key like select * from
hive_hbase_test where key='blabla', this would utilize the hbase row_key
index which give you very quick nearly real-time response just like hbase
does.

2. of coz, if you do a hive query against a column like select * from
hive_hbase_test where a='blabla', in this case, it queries against a
specific column, it probably uses mapred because there is nothing from
Hbase side can be utilized.

From my test, query 1 doesn't seem fast at all, still taking ages, so
select * from hive_hbase_test where key='blabla'   36secs
vs
get 'test', 'blabla'  less than 1 sec
still shows a huge difference.

Anybody has tried this before? Is there anyway I can do sort of query plan
analysis against hive query? or I am not mapping hive table against hbase
table correctly?


Re: HBase UI missing region list for active/functioning table

2012-09-11 Thread Norbert Burger
On Mon, Sep 10, 2012 at 3:29 PM, Stack st...@duboce.net wrote:
 On Mon, Sep 10, 2012 at 12:05 PM, Norbert Burger
 norbert.bur...@gmail.com wrote:

 Mind putting up full listing in pastebin?

Here's a link: http://pastebin.com/raw.php?i=4YhS8CpE.  The table in
question is called 'sessions', I did delete other tables' info from
this dump, as the .META. was quite large otherwise.

 We could try a master restart too... so it refreshes its in-memory
 state.  That might do it.

We've actually done this already, it hasn't seemed to resolve the situation.

Thanks,
Norbert


Re: java.io.IOException: Pass a Delete or a Put

2012-09-11 Thread Jothikumar Ekanath
Hi,

I am kind of stuck on this one, I read all the other similar issues and
coded based on that. But still i get this error.

Any help or clue will help me moving forward.

Thanks




On Mon, Sep 10, 2012 at 7:06 PM, Jothikumar Ekanath kbmku...@gmail.comwrote:

 Hi,
Getting this error while using hbase as a sink.


 Error
 java.io.IOException: Pass a Delete or a Put
 at
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:125)
 at
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:84)
 at
 org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
 at
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:156)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
 at
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)



  Below is my code
 Using the following version

 Hbase = 0.94
 Hadoop - 1.0.3

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.KeyValue;
 import org.apache.hadoop.hbase.client.Put;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
 import org.apache.hadoop.hbase.mapreduce.TableMapper;
 import org.apache.hadoop.hbase.mapreduce.TableReducer;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.*;

 import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.List;

 public class DailyAggMapReduce {

 public static void main(String args[]) throws Exception {
 Configuration config = HBaseConfiguration.create();
 Job job = new Job(config, DailyAverageMR);
 job.setJarByClass(DailyAggMapReduce.class);
 Scan scan = new Scan();
 // 1 is the default in Scan, which will be bad for MapReduce jobs
 scan.setCaching(500);
 // don't set to true for MR jobs
 scan.setCacheBlocks(false);

 TableMapReduceUtil.initTableMapperJob(
 HTASDB,// input table
 scan,   // Scan instance to control CF and
 attribute selection
 DailySumMapper.class, // mapper class
 Text.class, // mapper output key
 Text.class,  // mapper output value
 job);

 TableMapReduceUtil.initTableReducerJob(
 DA,// output table
 DailySumReducer.class,// reducer class
 job);

 //job.setOutputValueClass(Put.class);
 job.setNumReduceTasks(1);   // at least one, adjust as required

 boolean b = job.waitForCompletion(true);
 if (!b) {
 throw new IOException(error with job!);
 }

 }


 public static class DailySumMapper extends TableMapperText, Text {

 public void map(ImmutableBytesWritable row, Result value,
 Mapper.Context context) throws IOException, InterruptedException {
 ListString key = getRowKey(row.get());
 Text rowKey = new Text(key.get(0));
 int time = Integer.parseInt(key.get(1));
 //limiting the time for one day (Aug 04 2012) -- Testing, Not
 a good way
 if (time = 1344146400) {
 ListKeyValue data = value.list();
 long inbound = 0l;
 long outbound = 0l;
 for (KeyValue kv : data) {
 ListLong values = getValues(kv.getValue());
 if (values.get(0) != -1) {
 inbound = inbound + values.get(0);
 }
 if (values.get(1) != -1) {
 outbound = outbound + values.get(1);
 }
 }
 context.write(rowKey, new Text(String.valueOf(inbound) +
 - + String.valueOf(outbound)));
 }
 }

 private static ListLong getValues(byte[] data) {
 ListLong values = new ArrayListLong();
 ByteBuffer buffer = ByteBuffer.wrap(data);
 values.add(buffer.getLong());
 

Re: HBase aggregate query

2012-09-11 Thread James Taylor
iwannaplay games funnlearnforkids@... writes:
 
 Hi ,
 
 I want to run query like
 
 select month(eventdate),scene,count(1),sum(timespent) from eventlog
 group by month(eventdate),scene
 
 in hbase.Through hive its taking a lot of time for 40 million
 records.Do we have any syntax in hbase to find its result?In sql
 server it takes around 9 minutes,How long it might take in hbase??
 
 Regards
 Prabhjot
 
 

Hi,
In our internal testing using server-side coprocessors for aggregation, we've
found HBase can process these types of queries very quickly: ~10-12 seconds
using a four node cluster. You need to chunk up and parallelize the work on the
client side to get this kind of performance, though.
Regards,

James





Re: java.io.IOException: Pass a Delete or a Put

2012-09-11 Thread Stack
On Mon, Sep 10, 2012 at 7:06 PM, Jothikumar Ekanath kbmku...@gmail.com wrote:
 Hi,
Getting this error while using hbase as a sink.


 Error
 java.io.IOException: Pass a Delete or a Put

Would suggest you study the mapreduce jobs that ship with hbase both
in main and under test.

Looking at your program, you are all Text.  The above complaint is
about wanting a Put or Delete.  Can you change what you produce so
Put/Delete rather than Text?

St.Ack


Regarding rowkey

2012-09-11 Thread Ramasubramanian
Hi,

What can be used as rowkey to improve performance while loading into hbase? 
Currently I am having sequence. It takes some 11 odd minutes to load 1 million 
record with 147 columns.

Regards,
Rams

HBase recovery from failed -ROOT- / .META. server

2012-09-11 Thread Willy Chang
It appears to take 30 minutes or so for HBase to recover from the failure
of the regionserver holding the ROOT role. Please let me know what options
are available to more quickly recover from such a situation, as when this
happens our applications/SLAs are impacted.

It would also be good to be able to quickly recover from a failure of the
regionserver which owns the .META. table. During HBase startup, a random
server is elected to manage the ROOT and .META. tables (different servers).
This creates a single point of failure. At the very least, perhaps we can
find a way to force which server is selected for this role, perhaps even
just via startup order. We could then assign a server which doesn't
participate in flow tasks (no tasktracker), and so would be more stable.
There may also be a config option for this. Wondering if there is a way to
force election of a new ROOT/META owner within a minute or so instead of
30+ minutes.


Strata/Hadoop World HBase Meetup on October 25th

2012-09-11 Thread Otis Gospodnetic
Hi,

I don't think this was mentioned on the ML yet, but for those coming to Strata 
in New York next month, there is a Strata/Hadoop World HBase Meetup on October 
25th, organized by Jon Hsieh  friends and hosted by AppNexus:

http://www.meetup.com/HBase-NYC/events/81728932/

See you in NYC!

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm



Re: HBase aggregate query

2012-09-11 Thread lars hofhansl
That's when you aggregate along a sorted dimension (prefix of the key), though. 
Right?
Not sure how smart Hive is here, but if it needs to sort the data it will 
probably be slower than SQL Server for such a small data set.



- Original Message -
From: James Taylor jtay...@salesforce.com
To: user@hbase.apache.org
Cc: 
Sent: Monday, September 10, 2012 5:49 PM
Subject: Re: HBase aggregate query

iwannaplay games funnlearnforkids@... writes:
 
 Hi ,
 
 I want to run query like
 
 select month(eventdate),scene,count(1),sum(timespent) from eventlog
 group by month(eventdate),scene
 
 in hbase.Through hive its taking a lot of time for 40 million
 records.Do we have any syntax in hbase to find its result?In sql
 server it takes around 9 minutes,How long it might take in hbase??
 
 Regards
 Prabhjot
 
 

Hi,
In our internal testing using server-side coprocessors for aggregation, we've
found HBase can process these types of queries very quickly: ~10-12 seconds
using a four node cluster. You need to chunk up and parallelize the work on the
client side to get this kind of performance, though.
Regards,

James


Re: java.io.IOException: Pass a Delete or a Put

2012-09-11 Thread Jothikumar Ekanath
Hi Stack,
Thanks for the reply. I looked at the code and i am having
a very basic confusion on how to use it correctly.  The code i wrote
earlier has the following input and output types and i want it that way

After looking at the sources and examples, i modified my reducer (given
below), the mapper and job configuration are still the same. Still i see
the same error. Am i doing something wrong?

 DailySumMapper extends TableMapperText, Text
KEYOUT = Text
VALUEOUT = Text

 DailySumReducer extends TableReducerText, Text, ImmutableBytesWritable

KEYIN = Text
VALUEIN = Text
KEYOUT = ImmutableBytesWritable
VALUEOUT = must be always Put or Delete when we extend TableReducer, So
we are not specifying that.

Code
 public static class DailySumReducer extends TableReducerText, Text,
ImmutableBytesWritable {
private int count = 0;
protected void reduce(Text key, IterableText
values,Reducer.Context context) throws IOException, InterruptedException{
long inbound = 0l;
long outbound = 0l;
for (Text val : values) {
String text = val.toString();
int index = text.indexOf(-);
String in = text.substring(0,index);
String out = text.substring(index+1,text.length());
inbound = inbound + Long.parseLong(in);
outbound = outbound + Long.parseLong(out);
}
ByteBuffer data = ByteBuffer.wrap(new byte[16]);
data.putLong(inbound);
data.putLong(outbound);
Put put = new Put(Bytes.toBytes(key.toString()+20120804));
put.add(Bytes.toBytes(t), Bytes.toBytes(s),data.array());
context.setStatus(Emitting Put  + count++);
ImmutableBytesWritable ibw = new
ImmutableBytesWritable(Bytes.toBytes(key.toString()));
context.write(ibw,put);
}
}

On Tue, Sep 11, 2012 at 10:38 AM, Stack st...@duboce.net wrote:

 On Mon, Sep 10, 2012 at 7:06 PM, Jothikumar Ekanath kbmku...@gmail.com
 wrote:
  Hi,
 Getting this error while using hbase as a sink.
 
 
  Error
  java.io.IOException: Pass a Delete or a Put

 Would suggest you study the mapreduce jobs that ship with hbase both
 in main and under test.

 Looking at your program, you are all Text.  The above complaint is
 about wanting a Put or Delete.  Can you change what you produce so
 Put/Delete rather than Text?

 St.Ack



Help - can't start master server for HBase (pseudo-distributed mode).

2012-09-11 Thread Jason Huang
Hello,

I am trying to set up HBase at pseudo-distributed mode on my Macbook.
I've installed Hadoop 1.0.3 in pseudo-distributed mode and was able to
successfully start the nodes:
$ bin/start-all.sh
$ jps
1002 NameNode
1246 JobTracker
1453 Jps
1181 SecondaryNameNode
1335 TaskTracker
1091 DataNode

Then I installed HBase 0.94 and configured it at pseudo-distributed mode.
$ ./start-hbase.sh
$ jps
1684 Jps
1002 NameNode
1647 HRegionServer
1246 JobTracker
1553 HQuorumPeer
1181 SecondaryNameNode
1335 TaskTracker
1091 DataNode

I couldn't find the MasterServer running so I looked at the log file:
2012-09-11 14:35:05,892 INFO
org.apache.hadoop.hbase.master.ActiveMasterManager:
Master=192.168.10.23,6,1347388500668
2012-09-11 14:35:06,996 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
2012-09-11 14:35:07,998 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s).
2012-09-11 14:35:08,999 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s).
2012-09-11 14:35:10,000 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s).
2012-09-11 14:35:11,001 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s).
2012-09-11 14:35:12,002 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s).
2012-09-11 14:35:13,004 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s).
2012-09-11 14:35:14,005 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s).
2012-09-11 14:35:15,006 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s).
2012-09-11 14:35:16,008 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
2012-09-11 14:35:16,012 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
connection exception: java.net.ConnectException: Connection refused

Could anyone help me to figure out what I am missing? I tried to do
some google search but none of the answers there helped me.
Here are my config files:

hbase-site.xml
configuration
  property
namehbase.rootdir/name
valuehdfs://localhost:8020/hbase/value
  /property
  property
namehbase.zookeeper.quorum/name
valuelocalhost/value
  /property
  property
namehbase.cluster.distributed/name
valuetrue/value
  /property
  property
namedfs.replication/name
value1/value
  /property
  property
 namehbase.master/name
 valuelocalhost:6/value
  /property
/configuration

hdfs-site.xml
configuration
  property
 namefs.default.name/name
 valuelocalhost:9000/value
  /property
 namemapred.job.tracker/name
 valuelocalhost:9001/value
  property
 namedfs.replication/name
 value1/value
  /property
/configuration

mapred-site.xml
configuration
property
namemapred.job.tracker/name
valuelocalhost:9001/value
/property
property
namemapred.child.java.opts/name
value-Xmx512m/value
/property
property
namemapred.job.tracker/name
valuehdfs://localhost:54311/value
/property
/configuration

I've also tried ssh localhost and here is what I found on my shell:
$ ssh localhost
Last login: Tue Sep 11 14:22:37 2012 from localhost
$ ssh localhost -p 22
Last login: Tue Sep 11 14:41:17 2012 from localhost
$ ssh localhost -p 8020
ssh: connect to host localhost port 8020: Connection refused

thanks!


Re: Regarding rowkey

2012-09-11 Thread Doug Meil

Hi there, have you read this?

http://hbase.apache.org/book.html#performance

And especially this?

http://hbase.apache.org/book.html#perf.writing


How many nodes is the cluster?  Is the target table pre-split?  And if it
is, are you sure that the rows aren't winding up on a single region?





On 9/11/12 1:39 PM, Ramasubramanian
ramasubramanian.naraya...@gmail.com wrote:

Hi,

What can be used as rowkey to improve performance while loading into
hbase? Currently I am having sequence. It takes some 11 odd minutes to
load 1 million record with 147 columns.

Regards,
Rams




Re: Help - can't start master server for HBase (pseudo-distributed mode).

2012-09-11 Thread Shrijeet Paliwal
Your HDFS server is listening on a different port than the one you
configured in hbase-site (9000 != 8020).



On Tue, Sep 11, 2012 at 11:44 AM, Jason Huang jason.hu...@icare.com wrote:
 Hello,

 I am trying to set up HBase at pseudo-distributed mode on my Macbook.
 I've installed Hadoop 1.0.3 in pseudo-distributed mode and was able to
 successfully start the nodes:
 $ bin/start-all.sh
 $ jps
 1002 NameNode
 1246 JobTracker
 1453 Jps
 1181 SecondaryNameNode
 1335 TaskTracker
 1091 DataNode

 Then I installed HBase 0.94 and configured it at pseudo-distributed mode.
 $ ./start-hbase.sh
 $ jps
 1684 Jps
 1002 NameNode
 1647 HRegionServer
 1246 JobTracker
 1553 HQuorumPeer
 1181 SecondaryNameNode
 1335 TaskTracker
 1091 DataNode

 I couldn't find the MasterServer running so I looked at the log file:
 2012-09-11 14:35:05,892 INFO
 org.apache.hadoop.hbase.master.ActiveMasterManager:
 Master=192.168.10.23,6,1347388500668
 2012-09-11 14:35:06,996 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
 2012-09-11 14:35:07,998 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s).
 2012-09-11 14:35:08,999 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s).
 2012-09-11 14:35:10,000 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s).
 2012-09-11 14:35:11,001 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s).
 2012-09-11 14:35:12,002 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s).
 2012-09-11 14:35:13,004 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s).
 2012-09-11 14:35:14,005 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s).
 2012-09-11 14:35:15,006 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s).
 2012-09-11 14:35:16,008 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
 2012-09-11 14:35:16,012 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
 connection exception: java.net.ConnectException: Connection refused

 Could anyone help me to figure out what I am missing? I tried to do
 some google search but none of the answers there helped me.
 Here are my config files:

 hbase-site.xml
 configuration
   property
 namehbase.rootdir/name
 valuehdfs://localhost:8020/hbase/value
   /property
   property
 namehbase.zookeeper.quorum/name
 valuelocalhost/value
   /property
   property
 namehbase.cluster.distributed/name
 valuetrue/value
   /property
   property
 namedfs.replication/name
 value1/value
   /property
   property
  namehbase.master/name
  valuelocalhost:6/value
   /property
 /configuration

 hdfs-site.xml
 configuration
   property
  namefs.default.name/name
  valuelocalhost:9000/value
   /property
  namemapred.job.tracker/name
  valuelocalhost:9001/value
   property
  namedfs.replication/name
  value1/value
   /property
 /configuration

 mapred-site.xml
 configuration
 property
 namemapred.job.tracker/name
 valuelocalhost:9001/value
 /property
 property
 namemapred.child.java.opts/name
 value-Xmx512m/value
 /property
 property
 namemapred.job.tracker/name
 valuehdfs://localhost:54311/value
 /property
 /configuration

 I've also tried ssh localhost and here is what I found on my shell:
 $ ssh localhost
 Last login: Tue Sep 11 14:22:37 2012 from localhost
 $ ssh localhost -p 22
 Last login: Tue Sep 11 14:41:17 2012 from localhost
 $ ssh localhost -p 8020
 ssh: connect to host localhost port 8020: Connection refused

 thanks!


Re: HBase aggregate query

2012-09-11 Thread Jerry Lam
Hi Prabhjot:

Can you implement this using a counter?
That is whenever you insert a row with the month(eventdate) and scene
combination, increment the associated counter by one. Note that if you have
a batch insert of N, you can increment the counter by N.

Then you can simply query the counter whenever you want the aggregated
result.

HTH,

Jerry

On Tue, Sep 11, 2012 at 1:59 PM, lars hofhansl lhofha...@yahoo.com wrote:

 That's when you aggregate along a sorted dimension (prefix of the key),
 though. Right?
 Not sure how smart Hive is here, but if it needs to sort the data it will
 probably be slower than SQL Server for such a small data set.



 - Original Message -
 From: James Taylor jtay...@salesforce.com
 To: user@hbase.apache.org
 Cc:
 Sent: Monday, September 10, 2012 5:49 PM
 Subject: Re: HBase aggregate query

 iwannaplay games funnlearnforkids@... writes:
 
  Hi ,
 
  I want to run query like
 
  select month(eventdate),scene,count(1),sum(timespent) from eventlog
  group by month(eventdate),scene
 
  in hbase.Through hive its taking a lot of time for 40 million
  records.Do we have any syntax in hbase to find its result?In sql
  server it takes around 9 minutes,How long it might take in hbase??
 
  Regards
  Prabhjot
 
 

 Hi,
 In our internal testing using server-side coprocessors for aggregation,
 we've
 found HBase can process these types of queries very quickly: ~10-12 seconds
 using a four node cluster. You need to chunk up and parallelize the work
 on the
 client side to get this kind of performance, though.
 Regards,

 James



Re: Help - can't start master server for HBase (pseudo-distributed mode).

2012-09-11 Thread Jason Huang
Yes! This is it!

Thanks Shrijeet!

On Tue, Sep 11, 2012 at 2:47 PM, Shrijeet Paliwal
shrij...@rocketfuel.com wrote:
 Your HDFS server is listening on a different port than the one you
 configured in hbase-site (9000 != 8020).



 On Tue, Sep 11, 2012 at 11:44 AM, Jason Huang jason.hu...@icare.com wrote:
 Hello,

 I am trying to set up HBase at pseudo-distributed mode on my Macbook.
 I've installed Hadoop 1.0.3 in pseudo-distributed mode and was able to
 successfully start the nodes:
 $ bin/start-all.sh
 $ jps
 1002 NameNode
 1246 JobTracker
 1453 Jps
 1181 SecondaryNameNode
 1335 TaskTracker
 1091 DataNode

 Then I installed HBase 0.94 and configured it at pseudo-distributed mode.
 $ ./start-hbase.sh
 $ jps
 1684 Jps
 1002 NameNode
 1647 HRegionServer
 1246 JobTracker
 1553 HQuorumPeer
 1181 SecondaryNameNode
 1335 TaskTracker
 1091 DataNode

 I couldn't find the MasterServer running so I looked at the log file:
 2012-09-11 14:35:05,892 INFO
 org.apache.hadoop.hbase.master.ActiveMasterManager:
 Master=192.168.10.23,6,1347388500668
 2012-09-11 14:35:06,996 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
 2012-09-11 14:35:07,998 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s).
 2012-09-11 14:35:08,999 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s).
 2012-09-11 14:35:10,000 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s).
 2012-09-11 14:35:11,001 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s).
 2012-09-11 14:35:12,002 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s).
 2012-09-11 14:35:13,004 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s).
 2012-09-11 14:35:14,005 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s).
 2012-09-11 14:35:15,006 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s).
 2012-09-11 14:35:16,008 INFO org.apache.hadoop.ipc.Client: Retrying
 connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
 2012-09-11 14:35:16,012 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
 connection exception: java.net.ConnectException: Connection refused

 Could anyone help me to figure out what I am missing? I tried to do
 some google search but none of the answers there helped me.
 Here are my config files:

 hbase-site.xml
 configuration
   property
 namehbase.rootdir/name
 valuehdfs://localhost:8020/hbase/value
   /property
   property
 namehbase.zookeeper.quorum/name
 valuelocalhost/value
   /property
   property
 namehbase.cluster.distributed/name
 valuetrue/value
   /property
   property
 namedfs.replication/name
 value1/value
   /property
   property
  namehbase.master/name
  valuelocalhost:6/value
   /property
 /configuration

 hdfs-site.xml
 configuration
   property
  namefs.default.name/name
  valuelocalhost:9000/value
   /property
  namemapred.job.tracker/name
  valuelocalhost:9001/value
   property
  namedfs.replication/name
  value1/value
   /property
 /configuration

 mapred-site.xml
 configuration
 property
 namemapred.job.tracker/name
 valuelocalhost:9001/value
 /property
 property
 namemapred.child.java.opts/name
 value-Xmx512m/value
 /property
 property
 namemapred.job.tracker/name
 valuehdfs://localhost:54311/value
 /property
 /configuration

 I've also tried ssh localhost and here is what I found on my shell:
 $ ssh localhost
 Last login: Tue Sep 11 14:22:37 2012 from localhost
 $ ssh localhost -p 22
 Last login: Tue Sep 11 14:41:17 2012 from localhost
 $ ssh localhost -p 8020
 ssh: connect to host localhost port 8020: Connection refused

 thanks!


Re: Getting ScannerTimeoutException even after several calls in the specified time limit

2012-09-11 Thread Otis Gospodnetic
For pretty graphs with JVM GC info + system + HBase metrics you could also
easily hook up SPM to your cluster.  See URL in signature.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Sep 11, 2012 6:30 AM, HARI KUMAR harikum2...@gmail.com wrote:

 For GC Monitoring, Add Parameters export HBASE_OPTS=$HBASE_OPTS
 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
 -Xloggc:$HBASE_HOME/logs/gc-hbase.log to hbase-env.sh and try to view the
 file using tools like GCViewer.  or use tools like VisualVM to look at
 your GC Consumption.

 ./hari

 Add

 On Tue, Sep 11, 2012 at 2:11 PM, Dhirendra Singh dps...@gmail.com wrote:

  No i am not doing parallel scans,
 
  * If yes, check the time taken for GC and
  the number of calls that can be served at your end point*.
 
   could you please tell me how to do that, where can i see the GC logs?
 
 
  On Tue, Sep 11, 2012 at 12:54 PM, HARI KUMAR harikum2...@gmail.com
 wrote:
 
  Hi,
 
  Are u trying to do parallel scans. If yes, check the time taken for GC
 and
  the number of calls that can be served at your end point.
 
  Best Regards
  N.Hari Kumar
 
  On Tue, Sep 11, 2012 at 8:22 AM, Dhirendra Singh dps...@gmail.com
  wrote:
 
   i tried with a smaller caching i.e 10, it failed again, not its not
  really
   a big cell. this small cluster(4 nodes) is only used for Hbase, i am
   currently using hbase-0.92.1-cdh4.0.1. ,  could you just let me know
 how
   could i debug this issue ?
  
  
   aused by: org.apache.hadoop.hbase.client.ScannerTimeoutException:
   99560ms passed since the last invocation, timeout is currently set to
   6
   at
  
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1302)
   at
  
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1399)
   ... 5 more
   Caused by: org.apache.hadoop.hbase.UnknownScannerException:
   org.apache.hadoop.hbase.UnknownScannerException: Name:
   -8889369042827960647
   at
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2114)
   at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown
 Source)
   at
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
  
  
  
   On Mon, Sep 10, 2012 at 10:53 PM, Stack st...@duboce.net wrote:
  
On Mon, Sep 10, 2012 at 10:13 AM, Dhirendra Singh dps...@gmail.com
 
wrote:
 I am facing this exception while iterating over a big table,  by
   default
i
 have specified caching as 100,

 i am getting the below exception, even though i checked there are
   several
 calls made to the scanner before it threw this exception, but
  somehow
   its
 saying 86095ms were passed since last invocation.

 i also observed that if it set scan.setCaching(false),  it
 succeeds,
 could
 some one please explain or point me to some document as if what's
happening
 here and what's the best practices to avoid it.


   
Try again cachine  100.  See if it works.  A big cell?  A GC pause?
You should be able to tell roughly which server is being traversed
when you get the timeout.  Anything else going on on that server at
the time?  What version of HBase?
St.Ack
   
  
  
  
   --
   Warm Regards,
   Dhirendra Pratap
   +91. 9717394713
  
 
 
 
  --
  FROM
  HARI KUMAR.N
 
 
 
 
  --
  Warm Regards,
  Dhirendra Pratap
  +91. 9717394713
 
 
 
 


 --
 FROM
 HARI KUMAR.N



Re: Getting ScannerTimeoutException even after several calls in the specified time limit

2012-09-11 Thread Dhirendra Singh
could someone please clarify,   when i say caching 100 or any number,
 where does this actually happen on server (cluster  ) or client.  if i
assume it happens on cluster, so does this ScannerTimeOut is because of
caching as the server might have run out of memory and hence not able to
respond within the specified timeout?

any link related to caching mechanism in HBase would be of great help

Thanks,

On Wed, Sep 12, 2012 at 7:41 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 For pretty graphs with JVM GC info + system + HBase metrics you could also
 easily hook up SPM to your cluster.  See URL in signature.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Sep 11, 2012 6:30 AM, HARI KUMAR harikum2...@gmail.com wrote:

  For GC Monitoring, Add Parameters export HBASE_OPTS=$HBASE_OPTS
  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
  -Xloggc:$HBASE_HOME/logs/gc-hbase.log to hbase-env.sh and try to view
 the
  file using tools like GCViewer.  or use tools like VisualVM to look at
  your GC Consumption.
 
  ./hari
 
  Add
 
  On Tue, Sep 11, 2012 at 2:11 PM, Dhirendra Singh dps...@gmail.com
 wrote:
 
   No i am not doing parallel scans,
  
   * If yes, check the time taken for GC and
   the number of calls that can be served at your end point*.
  
could you please tell me how to do that, where can i see the GC logs?
  
  
   On Tue, Sep 11, 2012 at 12:54 PM, HARI KUMAR harikum2...@gmail.com
  wrote:
  
   Hi,
  
   Are u trying to do parallel scans. If yes, check the time taken for GC
  and
   the number of calls that can be served at your end point.
  
   Best Regards
   N.Hari Kumar
  
   On Tue, Sep 11, 2012 at 8:22 AM, Dhirendra Singh dps...@gmail.com
   wrote:
  
i tried with a smaller caching i.e 10, it failed again, not its not
   really
a big cell. this small cluster(4 nodes) is only used for Hbase, i am
currently using hbase-0.92.1-cdh4.0.1. ,  could you just let me know
  how
could i debug this issue ?
   
   
aused by: org.apache.hadoop.hbase.client.ScannerTimeoutException:
99560ms passed since the last invocation, timeout is currently set
 to
6
at
   
  
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1302)
at
   
  
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1399)
... 5 more
Caused by: org.apache.hadoop.hbase.UnknownScannerException:
org.apache.hadoop.hbase.UnknownScannerException: Name:
-8889369042827960647
at
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2114)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown
  Source)
at
   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
   
   
   
On Mon, Sep 10, 2012 at 10:53 PM, Stack st...@duboce.net wrote:
   
 On Mon, Sep 10, 2012 at 10:13 AM, Dhirendra Singh 
 dps...@gmail.com
  
 wrote:
  I am facing this exception while iterating over a big table,  by
default
 i
  have specified caching as 100,
 
  i am getting the below exception, even though i checked there
 are
several
  calls made to the scanner before it threw this exception, but
   somehow
its
  saying 86095ms were passed since last invocation.
 
  i also observed that if it set scan.setCaching(false),  it
  succeeds,
  could
  some one please explain or point me to some document as if
 what's
 happening
  here and what's the best practices to avoid it.
 
 

 Try again cachine  100.  See if it works.  A big cell?  A GC
 pause?
 You should be able to tell roughly which server is being traversed
 when you get the timeout.  Anything else going on on that server
 at
 the time?  What version of HBase?
 St.Ack

   
   
   
--
Warm Regards,
Dhirendra Pratap
+91. 9717394713
   
  
  
  
   --
   FROM
   HARI KUMAR.N
  
  
  
  
   --
   Warm Regards,
   Dhirendra Pratap
   +91. 9717394713
  
  
  
  
 
 
  --
  FROM
  HARI KUMAR.N
 




-- 
Warm Regards,
Dhirendra Pratap
+91. 9717394713


Re: Regarding rowkey

2012-09-11 Thread lars hofhansl
It depends. If you do not need to perform rangescans along (prefixes of) your 
row keys, you can prefix the row key by a hash of the row key.
That will give you a more or less random distribution of the keys and hence not 
hit the same region server over and over.

You'll probably also want to presplit your table then.

-- Lars



- Original Message -
From: Ramasubramanian ramasubramanian.naraya...@gmail.com
To: user@hbase.apache.org
Cc: 
Sent: Tuesday, September 11, 2012 10:39 AM
Subject: Regarding rowkey

Hi,

What can be used as rowkey to improve performance while loading into hbase? 
Currently I am having sequence. It takes some 11 odd minutes to load 1 million 
record with 147 columns.

Regards,
Rams 


RE: HBase recovery from failed -ROOT- / .META. server

2012-09-11 Thread Ramkrishna.S.Vasudevan
Hi Willy

Yes I agree that META/ROOT recovery should happen as fast as possible.
Which version of HBase are you using? 

Lot of fixes have gone into the latest versions regarding the recovery part.

You can take a look at HBASE-6713 also if you are using any of the latest
versions.

If you can post the logs it would be great so that we can identify the
scenario in which the recovery took time.  If it looks like a bug we can
file a JIRA and work on resolving it.  
In current HBase trunk lot of activites w.r.t MTTR (Mean time to Recover) is
happening.  
Inputs towards MTTR will always be taken with highest priority.

Thanks  Regards
Ram
 -Original Message-
 From: Willy Chang [mailto:willy.chang...@gmail.com]
 Sent: Tuesday, September 11, 2012 11:11 PM
 To: user@hbase.apache.org
 Subject: HBase recovery from failed -ROOT- / .META. server
 
 It appears to take 30 minutes or so for HBase to recover from the
 failure
 of the regionserver holding the ROOT role. Please let me know what
 options
 are available to more quickly recover from such a situation, as when
 this
 happens our applications/SLAs are impacted.
 
 It would also be good to be able to quickly recover from a failure of
 the
 regionserver which owns the .META. table. During HBase startup, a
 random
 server is elected to manage the ROOT and .META. tables (different
 servers).
 This creates a single point of failure. At the very least, perhaps we
 can
 find a way to force which server is selected for this role, perhaps
 even
 just via startup order. We could then assign a server which doesn't
 participate in flow tasks (no tasktracker), and so would be more
 stable.
 There may also be a config option for this. Wondering if there is a way
 to
 force election of a new ROOT/META owner within a minute or so instead
 of
 30+ minutes.



RE: Getting ScannerTimeoutException even after several calls in the specified time limit

2012-09-11 Thread Anoop Sam John
could someone please clarify,   when i say caching 100 or any number,
 where does this actually happen on server (cluster  ) or client

This happens at both places. When the scan calls with caching= N,  the client 
will pass this number N to the 1st region which is under scan for this specific 
scan. Server side (RS) will try to find as much results(rows) from this region 
with max rows=N. If it is able to find the client got the results for that 
next() call.  If it gets rows less than N, client will try to get the remaining 
number of rows from the next region and so on.. Mostly this will happen in 
server side alone.[It might be finding N rows from one region itself]  But when 
you have some Filter conditions it might not be finding N rows from one 
region...

Note : Client will try to find N rows with one next() call as N is specified as 
caching. So it might be contacting many regions across different RSs.  There is 
a max result size config param also available at client side..  If the total 
size of the results exceeds this value and there are less than N results, then 
client will stop scanning even it has not got N results... If this cross of 
size is not happening well one call of next() might go through all the 
regions.. [You may be getting ScannerTimeouts due to RPC time outs]

Hope I have answered your question..  :)

-Anoop-

From: Dhirendra Singh [dps...@gmail.com]
Sent: Wednesday, September 12, 2012 7:55 AM
To: user@hbase.apache.org
Subject: Re: Getting ScannerTimeoutException even after several calls in the 
specified time limit

could someone please clarify,   when i say caching 100 or any number,
 where does this actually happen on server (cluster  ) or client.  if i
assume it happens on cluster, so does this ScannerTimeOut is because of
caching as the server might have run out of memory and hence not able to
respond within the specified timeout?

any link related to caching mechanism in HBase would be of great help

Thanks,

On Wed, Sep 12, 2012 at 7:41 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 For pretty graphs with JVM GC info + system + HBase metrics you could also
 easily hook up SPM to your cluster.  See URL in signature.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Sep 11, 2012 6:30 AM, HARI KUMAR harikum2...@gmail.com wrote:

  For GC Monitoring, Add Parameters export HBASE_OPTS=$HBASE_OPTS
  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
  -Xloggc:$HBASE_HOME/logs/gc-hbase.log to hbase-env.sh and try to view
 the
  file using tools like GCViewer.  or use tools like VisualVM to look at
  your GC Consumption.
 
  ./hari
 
  Add
 
  On Tue, Sep 11, 2012 at 2:11 PM, Dhirendra Singh dps...@gmail.com
 wrote:
 
   No i am not doing parallel scans,
  
   * If yes, check the time taken for GC and
   the number of calls that can be served at your end point*.
  
could you please tell me how to do that, where can i see the GC logs?
  
  
   On Tue, Sep 11, 2012 at 12:54 PM, HARI KUMAR harikum2...@gmail.com
  wrote:
  
   Hi,
  
   Are u trying to do parallel scans. If yes, check the time taken for GC
  and
   the number of calls that can be served at your end point.
  
   Best Regards
   N.Hari Kumar
  
   On Tue, Sep 11, 2012 at 8:22 AM, Dhirendra Singh dps...@gmail.com
   wrote:
  
i tried with a smaller caching i.e 10, it failed again, not its not
   really
a big cell. this small cluster(4 nodes) is only used for Hbase, i am
currently using hbase-0.92.1-cdh4.0.1. ,  could you just let me know
  how
could i debug this issue ?
   
   
aused by: org.apache.hadoop.hbase.client.ScannerTimeoutException:
99560ms passed since the last invocation, timeout is currently set
 to
6
at
   
  
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1302)
at
   
  
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1399)
... 5 more
Caused by: org.apache.hadoop.hbase.UnknownScannerException:
org.apache.hadoop.hbase.UnknownScannerException: Name:
-8889369042827960647
at
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2114)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown
  Source)
at
   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at
   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
   
   
   
On Mon, Sep 10, 2012 at 10:53 PM, Stack st...@duboce.net wrote:
   
 On Mon, Sep 10, 2012 at 10:13 AM, Dhirendra Singh 
 dps...@gmail.com
  
 wrote:
  I am facing this exception while iterating over a big table,  by
default
 i
  have