Re: Modifying SingleColumnValueFilter to not include matched KV

2013-01-24 Thread David Koch
Ha,

I think I found it: I had multiple versions of the KV - so the last
statement should read ReturnCode.NEXT_COL.

/David

On Thu, Jan 24, 2013 at 12:47 AM, David Koch ogd...@googlemail.com wrote:

 Hello,

 As part of some custom filter building I took the source
 of SingleColumnValueFilter (HBase 0.92.1) [1] and wanted to tweak it to NOT
 return the matched column - thus essentially make it
 equivalent SingleColumnValueExcludeFilter. I thought it must be trivial but
 for some reason I cannot get it to work. The filter always includes the
 matched KV pair.

 The only change I made is in the filterKeyValue(KeyValue) method by
 editing the last statement (see below):

 public ReturnCode filterKeyValue(KeyValue keyValue) {
 if (this.matchedColumn) {
   // We already found and matched the single column, all keys now pass
   return ReturnCode.INCLUDE;
 } else if (this.latestVersionOnly  this.foundColumn) {
   // We found but did not match the single column, skip to next row
   return ReturnCode.NEXT_ROW;
 }
 if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier))
 {
   return ReturnCode.INCLUDE;
 }
 foundColumn = true;
 if (filterColumnValue(keyValue.getBuffer(),
 keyValue.getValueOffset(), keyValue.getValueLength())) {
   return this.latestVersionOnly? ReturnCode.NEXT_ROW:
 ReturnCode.INCLUDE;
 }
 this.matchedColumn = true;
 // Commented line below to NOT include matched column
 // return ReturnCode.INCLUDE;
 return ReturnCode.SKIP;
  }

 Is this expected behavior? What am I overlooking here? By the way - how
 can I sensibly debug filters. I tried using the Log instance but the output
 does not show up in the region server's output.

 Thank you,

 /David

 [1]
 http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java/?v=source



Re: tabel in 'transition' state which cannot be dropped

2013-01-24 Thread ramkrishna vasudevan
Oops, which version is of HBase is this?

If the problem persists can you restart the cluster.  Sounds bad but you
may have to do that. :(

Regards
Ram

On Thu, Jan 24, 2013 at 3:46 PM, hua beatls bea...@gmail.com wrote:

 HI,
i have a table in 'transition' state, which couldn't  be 'disable'  or
 enable. I try to 'drop' it but failed. below is the error messages.

 hbase(main):012:0 drop 'T21_0513_201301_bigtable'
 ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
 org.apache.hadoop.hbase.TableNotDisabledException: T21_0513_201301_bigtable
 at

 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
 at

 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
 at

 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
 at
 org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at

 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)


 below is the excerpts from   webui:

 Regions in Transition  Region State bd8d2bf3ef04d0f8d3dac5ca2f612f42

 T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
 state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
 server=hadoop1,60020,1358993820407



Re: drop table problem

2013-01-24 Thread Mohammad Tariq
Which version are you using?Try hbck and see if you find anything
interesting. This problem was faced by a couple of folks few weeks ago. Try
to search through the mailing list. Probably there is some problem with the
znode holding this table. Remove it and restart everything.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Jan 24, 2013 at 3:48 PM, hua beatls bea...@gmail.com wrote:

 HI,
 i have a table in 'transition' state, which couldn't be 'disable' or
 enable. I try to 'drop' it but failed. below is the error messages.

 hbase(main):012:0 drop 'T21_0513_201301_bigtable'
 ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
 org.apache.hadoop.hbase.TableNotDisabledException: T21_0513_201301_bigtable
 at

 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
 at

 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
 at

 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
 at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at

 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
  below is the excerpts from webui:
 Regions in
 TransitionRegionStatebd8d2bf3ef04d0f8d3dac5ca2f612f42T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
 state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
 server=hadoop1,60020,1358993820407



Re: tabel in 'transition' state which cannot be dropped

2013-01-24 Thread Mohammad Tariq
looks to me a copy of your other email, with a different heading.

anyways, do as Ram sir has said.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Jan 24, 2013 at 3:54 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 Oops, which version is of HBase is this?

 If the problem persists can you restart the cluster.  Sounds bad but you
 may have to do that. :(

 Regards
 Ram

 On Thu, Jan 24, 2013 at 3:46 PM, hua beatls bea...@gmail.com wrote:

  HI,
 i have a table in 'transition' state, which couldn't  be 'disable'  or
  enable. I try to 'drop' it but failed. below is the error messages.
 
  hbase(main):012:0 drop 'T21_0513_201301_bigtable'
  ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
  org.apache.hadoop.hbase.TableNotDisabledException:
 T21_0513_201301_bigtable
  at
 
 
 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
  at
 
 
 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
  at
 
 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
  at
  org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at
 
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
 
 
  below is the excerpts from   webui:
 
  Regions in Transition  Region State bd8d2bf3ef04d0f8d3dac5ca2f612f42
 
 
 T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
  state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
  server=hadoop1,60020,1358993820407
 



Re: tabel in 'transition' state which cannot be dropped

2013-01-24 Thread ramkrishna vasudevan
If the problem persists we may have to remove the znode for that table also
and restart.

I think we need to make the hbck handle such cases.  I remember that HBCK
fixes the region in transition problem but does not go and mark the table
to DISABLED/ENABLED based on what was happening previously on the cluster.
Otherwise folks can use that option in HBCK.   There is a JIRA opened for
that which i am not able to find now.  Or may be it is fixed in recent
versions.

Regards
Ram

On Thu, Jan 24, 2013 at 3:59 PM, Mohammad Tariq donta...@gmail.com wrote:

 looks to me a copy of your other email, with a different heading.

 anyways, do as Ram sir has said.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Thu, Jan 24, 2013 at 3:54 PM, ramkrishna vasudevan 
 ramkrishna.s.vasude...@gmail.com wrote:

  Oops, which version is of HBase is this?
 
  If the problem persists can you restart the cluster.  Sounds bad but you
  may have to do that. :(
 
  Regards
  Ram
 
  On Thu, Jan 24, 2013 at 3:46 PM, hua beatls bea...@gmail.com wrote:
 
   HI,
  i have a table in 'transition' state, which couldn't  be 'disable'
  or
   enable. I try to 'drop' it but failed. below is the error messages.
  
   hbase(main):012:0 drop 'T21_0513_201301_bigtable'
   ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
   org.apache.hadoop.hbase.TableNotDisabledException:
  T21_0513_201301_bigtable
   at
  
  
 
 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
   at
  
  
 
 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
   at
  
  
 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
   at
   org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
  
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
  
  
   below is the excerpts from   webui:
  
   Regions in Transition  Region State bd8d2bf3ef04d0f8d3dac5ca2f612f42
  
  
 
 T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
   state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
   server=hadoop1,60020,1358993820407
  
 



Re: HBASE-7114 Increment does not extend Mutation but probably should

2013-01-24 Thread Amit Sela
I'm using Increment.getFamilyMap in a postIncrement Observer.
I'm running with HBase 0.94.2.

Amit.

On Thu, Jan 24, 2013 at 4:23 AM, lars hofhansl la...@apache.org wrote:

 The reason was that Increment was serialized differently (compared to all
 other mutations).
 In trunk that is no longer an issue, since the serialization logic is no
 longer part of the object to be serialized.


 -- Lars



 
  From: Ted Yu yuzhih...@gmail.com
 To: d...@hbase.apache.org; user@hbase.apache.org
 Sent: Wednesday, January 23, 2013 10:25 AM
 Subject: HBASE-7114 Increment does not extend Mutation but probably should

 Hi,
 I want to get opinion on whether we should proceed with HBASE-7114
 'Increment does not extend Mutation but probably should' in trunk.

 Is anyone using Increment.setWriteToWAL or Increment.getFamilyMap ?
 For Increment.setWriteToWAL, are you using the Increment returned ?

 Your feedback would be appreciated.



Re: How to get coprocessor list by client API

2013-01-24 Thread Jean-Marc Spaggiari
Hi Kyle,

This will give you all the attributs of the table, not just the
coprocessors, so don't forget to parse they key using
CP_HTD_ATTR_KEY_PATTERN ...

I will add in my ToDo to add a List getCoprocessors() method in
HTableInterface or HTableDescriptor...

JM

2013/1/23, Kyle Lin kylelin2...@gmail.com:
 Hello JM

 It really works! Thanks a lot.

 Hello Jack

 For each table, it needs to use htable.getTableDescriptor().

 hbaseAdmin.getTableDescriptor only gets -ROOT- and .META.

 So I use the code as follows,

 HTable htable = new HTable(config, tableName);
 HTableDescriptor htableDesc = *htable.getTableDescriptor()*;
 MapImmutableBytesWritable, ImmutableBytesWritable maps =
 htableDesc.getValues();
 SetEntryImmutableBytesWritable, ImmutableBytesWritable sets =
 maps.entrySet();
 for (Map.EntryImmutableBytesWritable, ImmutableBytesWritable entrySet :
 sets) {
 String stringKey = Bytes.toString(entrySet.getKey().get());
 String stringValue = Bytes.toString(entrySet.getValue().get());
 System.out.println(key: + stringKey + , value: + stringValue);
 }
 htable.close();

 Kyle

 2013/1/24 jack ky73...@yahoo.com.tw

 Hi, Kyle

 Configuration config = HBaseConfiguration.create();
 config.set(hbase.zookeeper.quorum, host3);
 config.set(hbase.zookeeper.property.clientPort, 2181);

 config.set(fs.default.name, hdfs://host3:9000);
 config.set(mapred.job.tracker, hdfs://host3:9001);

 HBaseAdmin hbaseAdmin = new HBaseAdmin(config);

 HTableDescriptor htableDescriptor =
 hbaseAdmin.getTableDescriptor(Bytes.toBytes(table21));

 MapImmutableBytesWritable, ImmutableBytesWritable maps =
 htableDescriptor.getValues();
 SetEntryImmutableBytesWritable, ImmutableBytesWritable sets =
 maps.entrySet();
 IteratorEntryImmutableBytesWritable, ImmutableBytesWritable
 it
 = sets.iterator();
 while(it.hasNext()){
 EntryImmutableBytesWritable, ImmutableBytesWritable
 keys = it.next();
 ImmutableBytesWritable ibwKey = keys.getKey();
 ImmutableBytesWritable ibwValue = keys.getValue();
 String stringKey = Bytes.toString(ibwKey.get());
 String stringValue = Bytes.toString(ibwValue.get());
 System.out.println(stringKey +  + stringValue);
 }
 hbaseAdmin.close();




 
  寄件者: Kyle Lin kylelin2...@gmail.com
 收件者: user@hbase.apache.org
 寄件日期: 2013/1/23 (週三) 4:18 PM
 主旨: How to get coprocessor list by client API

 Hi, Everyone

 I need to know What coprocessors registered in a HTable. But, in
 Class
 HTableDescriptor, I can only find
 *addCoprocessor
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html#addCoprocessor(java.lang.String)
 
 *, *hasCoprocessor
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html#hasCoprocessor(java.lang.String)
 
 * ..etc. How can I use Client API to get the coprocessor information just
 like Typing describe table_name in HBase Shell as follows?


 hbase(main):002:0 describe 'table21'
 DESCRIPTION
   ENABLED
 {NAME = 'table21', *coprocessor$1 =
 'hdfs://host3:9000/sumCoprocessor.jar|idv.jack.endpoint true
*
 * .SumDataEndpoint||'*, FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING
 =
 'NONE', BLOOMFILTER
 = 'NONE', REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION =
 'NONE', MIN_VERSIONS =
   '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE =
 '65536', IN_MEMORY =
   'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

 1 row(s) in 0.0210 seconds

 Kyle




Re: paging results filter

2013-01-24 Thread Mohammad Tariq
I think you need
PageFilterhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
.

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Jan 24, 2013 at 6:20 PM, Toby Lazar tla...@gmail.com wrote:

 Hi,

 I need to create a client function that allows paging of scan results
 (initially return results 1-20, then click on page to to show results
 21-40, 41-60, etc.) without needing to remember the start rowkey.  I
 beleive that a filter would be far more efficient than implementing the
 logic client-side.  I couldn't find any OOTB filter for this functionality
 so I wrote the class below.  It seems to work fine for me, but can anyone
 comment if this approach makes sense?  Is there another OOTB filter that I
 can use instead?

 Thank you,

 Toby



 import java.io.DataInput;
 import java.io.DataOutput;
 import java.io.IOException;
 import org.apache.hadoop.hbase.filter.FilterBase;
 public class PageOffsetFilter extends FilterBase {
  private long startRowCount;
  private long endRowCount;

  private int count = 0;
  public PageOffsetFilter() {
  }

  public PageOffsetFilter(long pageNumber, long pageSize) {

   if(pageNumber1)
pageNumber=1;

   startRowCount = (pageNumber - 1) * pageSize;
   endRowCount = (pageSize * pageNumber)-1;
  }
  @Override
  public boolean filterAllRemaining() {
   return count  endRowCount;
  }
  @Override
  public boolean filterRow() {

   count++;
   if(count = startRowCount) {
return true;
   } else {
return false;
   }

  }

  @Override
  public void readFields(DataInput dataInput) throws IOException {

   this.startRowCount = dataInput.readLong();
   this.endRowCount = dataInput.readLong();
  }
  @Override
  public void write(DataOutput dataOutput) throws IOException {
   dataOutput.writeLong(startRowCount);
   dataOutput.writeLong(endRowCount);
  }

 }



Re: paging results filter

2013-01-24 Thread Toby Lazar
I don't see a way of specifying which page of resluts I want.  For example,
if I want page 3 with page size of 20 (only results 41-60), I don't see how
PageFilter can be configued for that.  Am I missing the obvious?

Thanks,

Toby

On Thu, Jan 24, 2013 at 7:52 AM, Mohammad Tariq donta...@gmail.com wrote:

 I think you need
 PageFilter
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
 
 .

 HTH

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Thu, Jan 24, 2013 at 6:20 PM, Toby Lazar tla...@gmail.com wrote:

  Hi,
 
  I need to create a client function that allows paging of scan results
  (initially return results 1-20, then click on page to to show results
  21-40, 41-60, etc.) without needing to remember the start rowkey.  I
  beleive that a filter would be far more efficient than implementing the
  logic client-side.  I couldn't find any OOTB filter for this
 functionality
  so I wrote the class below.  It seems to work fine for me, but can anyone
  comment if this approach makes sense?  Is there another OOTB filter that
 I
  can use instead?
 
  Thank you,
 
  Toby
 
 
 
  import java.io.DataInput;
  import java.io.DataOutput;
  import java.io.IOException;
  import org.apache.hadoop.hbase.filter.FilterBase;
  public class PageOffsetFilter extends FilterBase {
   private long startRowCount;
   private long endRowCount;
 
   private int count = 0;
   public PageOffsetFilter() {
   }
 
   public PageOffsetFilter(long pageNumber, long pageSize) {
 
if(pageNumber1)
 pageNumber=1;
 
startRowCount = (pageNumber - 1) * pageSize;
endRowCount = (pageSize * pageNumber)-1;
   }
   @Override
   public boolean filterAllRemaining() {
return count  endRowCount;
   }
   @Override
   public boolean filterRow() {
 
count++;
if(count = startRowCount) {
 return true;
} else {
 return false;
}
 
   }
 
   @Override
   public void readFields(DataInput dataInput) throws IOException {
 
this.startRowCount = dataInput.readLong();
this.endRowCount = dataInput.readLong();
   }
   @Override
   public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeLong(startRowCount);
dataOutput.writeLong(endRowCount);
   }
 
  }
 



Re: drop table problem

2013-01-24 Thread Vikas Jadhav
try to diable table first

disable 'table_name'

drop 'tab-name'



On Thu, Jan 24, 2013 at 3:48 PM, hua beatls bea...@gmail.com wrote:

 HI,
 i have a table in 'transition' state, which couldn't be 'disable' or
 enable. I try to 'drop' it but failed. below is the error messages.

 hbase(main):012:0 drop 'T21_0513_201301_bigtable'
 ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
 org.apache.hadoop.hbase.TableNotDisabledException: T21_0513_201301_bigtable
 at

 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
 at

 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
 at

 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
 at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at

 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
  below is the excerpts from webui:
 Regions in
 TransitionRegionStatebd8d2bf3ef04d0f8d3dac5ca2f612f42T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
 state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
 server=hadoop1,60020,1358993820407




-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*


Re: Join Using MapReduce and Hbase

2013-01-24 Thread Doug Meil

Hi there-

Here is a comment in the RefGuide on joins in the HBase data model.

http://hbase.apache.org/book.html#joins

Short answer, you need to do it yourself (e.g., either with an in-memory
hashmap or instantiating an HTable of the other table, depending on your
situation).

For other MR examples, see this...

http://hbase.apache.org/book.html#mapreduce.example




On 1/24/13 8:19 AM, Vikas Jadhav vikascjadha...@gmail.com wrote:

Hi I am working join operation using MapReduce
So if anyone has useful information plz share it.
Example Code or New Technique along with existing one.
Thank You.
-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*




Re: drop table problem

2013-01-24 Thread Kevin O'dell
Typically, hbck won't detect anything wrong here, as Ram said in another
thread we really should work in this functionality.

1.) Shut the HBase cluster - go to ZKcli and rmr /hbase - Start HBase back
up

2.) Move the table, use hbck -fixMeta -fixAssignments, restart the HBase
(not a great option if there is data on the table)

3.) Force an assign on the region for the table and see if it clears it up
(Should create a new znode)

4.) Go to ZK Cli and check /hbase for unassigned regions and other data
correlating with that region and remove it, then restart HBase

On Thu, Jan 24, 2013 at 8:34 AM, Vikas Jadhav vikascjadha...@gmail.comwrote:

 try to diable table first

 disable 'table_name'

 drop 'tab-name'



 On Thu, Jan 24, 2013 at 3:48 PM, hua beatls bea...@gmail.com wrote:

  HI,
  i have a table in 'transition' state, which couldn't be 'disable' or
  enable. I try to 'drop' it but failed. below is the error messages.
 
  hbase(main):012:0 drop 'T21_0513_201301_bigtable'
  ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
  org.apache.hadoop.hbase.TableNotDisabledException:
 T21_0513_201301_bigtable
  at
 
 
 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
  at
 
 
 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
  at
 
 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
  at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at
 
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
   below is the excerpts from webui:
  Regions in
 
 TransitionRegionStatebd8d2bf3ef04d0f8d3dac5ca2f612f42T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
  state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
  server=hadoop1,60020,1358993820407
 



 --
 *
 *
 *

 Thanx and Regards*
 * Vikas Jadhav*




-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera


Re: drop table problem

2013-01-24 Thread Kevin O'dell
Sorry I should have specified those are different options to try, not an
ordered set of instructions.

On Thu, Jan 24, 2013 at 8:47 AM, Kevin O'dell kevin.od...@cloudera.comwrote:

 Typically, hbck won't detect anything wrong here, as Ram said in another
 thread we really should work in this functionality.

 1.) Shut the HBase cluster - go to ZKcli and rmr /hbase - Start HBase back
 up

 2.) Move the table, use hbck -fixMeta -fixAssignments, restart the HBase
 (not a great option if there is data on the table)

 3.) Force an assign on the region for the table and see if it clears it up
 (Should create a new znode)

 4.) Go to ZK Cli and check /hbase for unassigned regions and other data
 correlating with that region and remove it, then restart HBase

 On Thu, Jan 24, 2013 at 8:34 AM, Vikas Jadhav vikascjadha...@gmail.comwrote:

 try to diable table first

 disable 'table_name'

 drop 'tab-name'



 On Thu, Jan 24, 2013 at 3:48 PM, hua beatls bea...@gmail.com wrote:

  HI,
  i have a table in 'transition' state, which couldn't be 'disable' or
  enable. I try to 'drop' it but failed. below is the error messages.
 
  hbase(main):012:0 drop 'T21_0513_201301_bigtable'
  ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
  org.apache.hadoop.hbase.TableNotDisabledException:
 T21_0513_201301_bigtable
  at
 
 
 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
  at
 
 
 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
  at
 
 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
  at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at
 
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
   below is the excerpts from webui:
  Regions in
 
 TransitionRegionStatebd8d2bf3ef04d0f8d3dac5ca2f612f42T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
  state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
  server=hadoop1,60020,1358993820407
 



 --
 *
 *
 *

 Thanx and Regards*
 * Vikas Jadhav*




 --
 Kevin O'Dell
 Customer Operations Engineer, Cloudera




-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera


Re: LoadIncrementalHFiles always run with hbase user

2013-01-24 Thread Harsh J
The exception is remote and seems to indicate that your RS is running
as the 'hbase' user. RS will attempt to do a mv/rename operation when
you provide it a bulkloadable file, which will then be attempted as
the user the RS itself runs as - thereby this error.

On Thu, Jan 24, 2013 at 6:39 AM, anil gupta anilgupt...@gmail.com wrote:
 Hi All,

 I am generating HFiles by running the bulk loader with a custom mapper.
 Once the MR job for generating HFile is finished, I trigger the loading of
 HFiles into HBase with the help of following java code:
 ToolRunner.run(new LoadIncrementalHFiles(HBaseConfiguration.create()), new
 String[]{conf.get(importtsv.bulk.output), otherArgs[0]});

 However, while loading i am getting errors related to permissions since the
 loading is being attempted by hbase user even though the process(java
 program) was started by root. This seems like a bug since the loading of
 data into HBase should also be done as root. Is there any for only using
 hbase user while loading?
 HBase cluster is not secured. I am using 0.92.1 and its fully distributed
 cluster. Please help me in resolving this error.

 Here is the error message:
 13/01/23 17:02:16 WARN mapreduce.LoadIncrementalHFiles: Skipping
 non-directory hdfs://ihubcluster/tmp/hfile_txn_subset/_SUCCESS
 13/01/23 17:02:16 INFO hfile.CacheConfig: Allocating LruBlockCache with
 maximum size 241.7m
 13/01/23 17:02:16 INFO mapreduce.LoadIncrementalHFiles: Trying to load
 hfile=hdfs://ihubcluster/tmp/hfile_txn_subset/t/344d58edc7d74e7b9a35ef5e1bf906cc
 first=\x00\x0F(\xC7F\xAD2\xB4\x00\x00\x02\x87\xE1\xB9\x9F\x18\x00\x0C\x1E\x1A\x00\x00\x01j\x14\x95d
 last=\x00\x12\xA4\xC6$IP\x9D\x00\x00\x02\x88+\x11\xD2
 \x00\x0C\x1E\x1A\x00\x00\x01j\x14\x04A
 13/01/23 17:02:55 ERROR mapreduce.LoadIncrementalHFiles: Encountered
 unrecoverable error from region server
 org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
 attempts=10, exceptions:
 Wed Jan 23 17:02:16 PST 2013,
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@7b4189d0,
 org.apache.hadoop.security.AccessControlException:
 org.apache.hadoop.security.AccessControlException: Permission denied:
 user=hbase, access=WRITE,
 inode=/tmp/hfile_txn_subset/t:root:hadoop:drwxr-xr-x
 at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
 at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186)
 at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4265)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:4231)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:2347)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:2315)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:579)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:374)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42612)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)

 at sun.reflect.GeneratedConstructorAccessor21.newInstance(Unknown
 Source)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
 at
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
 at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1237)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:294)
 at
 org.apache.hadoop.hbase.regionserver.StoreFile.rename(StoreFile.java:640)
 at
 org.apache.hadoop.hbase.regionserver.Store.bulkLoadHFile(Store.java:420)
 at
 org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:2803)
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2417)
 at 

FW: Kundera 2.3 released

2013-01-24 Thread Vivek Mishra


From: kundera-disc...@googlegroups.com [kundera-disc...@googlegroups.com] on 
behalf of Vivek Mishra
Sent: 24 January 2013 20:29
To: kundera-disc...@googlegroups.com
Subject: {kundera-discuss} Kundera 2.3 released

Hi All,

We are happy to announce release of Kundera 2.3.

Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL 
datastores. The idea behind Kundera is to make working with NoSQL Databases 
drop-dead simple and fun.
It currently supports Cassandra, HBase, MongoDB, Redis and relational databases.

Major Changes:
-
1)  Added Redis (http://redis.io/) to Kundera's supported database list. 
(https://github.com/impetus-opensource/Kundera/wiki/Kundera-over-Redis-Connecting-...)
2)  Cassandra 1.2 migration.
3) Changes in HBase schema handling.
4)  Stronger query support, like selective column/id search via JPQL.
5)  Enable support for @Transient for embeddedColumns and mappedsuperclass.
6)  Allow to set record limit on search for mongodb .
7)  Performance improvement on Cassandra,HBase,MongoDB.


Github Bug Fixes:
--
https://github.com/impetus-opensource/Kundera/issues/163
https://github.com/impetus-opensource/Kundera/issues/162
https://github.com/impetus-opensource/Kundera/issues/154
https://github.com/impetus-opensource/Kundera/issues/141
https://github.com/impetus-opensource/Kundera/issues/133
https://github.com/impetus-opensource/Kundera/issues/131
https://github.com/impetus-opensource/Kundera/issues/127
https://github.com/impetus-opensource/Kundera/issues/122
https://github.com/impetus-opensource/Kundera/issues/121
https://github.com/impetus-opensource/Kundera/issues/117
https://github.com/impetus-opensource/Kundera/issues/84
https://github.com/impetus-opensource/Kundera/issues/67

@kundera-discuss  issues
---
1) Batch operation over Cassandra composite key not working.


We have revamped our wiki, so you might want to have a look at it here:
https://github.com/impetus-opensource/Kundera/wiki

To download, use or contribute to Kundera, visit:
http://github.com/impetus-opensource/Kundera

Latest released tag version is 2.3 Kundera maven libraries are now available 
at: https://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
http://github.com/impetus-opensource/Kundera-Examples

And

https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests


Thank you all for your contributions!

Sincerely,
Kundera Team








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

--










NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


GC pause issues

2013-01-24 Thread Varun Sharma
Hi,

I have a region server which has the following logs. As you can see from
the log, ParNew is sufficiently big (450M) and there are heavy writes going
in. I am seeing 200ms pauses which eventually build up and there is a
promotion failure. There is a parnew collection every 2-3 seconds so it
fills up real fast. My memstore size is bigger 512m for flushes and 4
regions per server. (overall size is 3G for all memstores) - I have mslab
enabled

2013-01-24T13:08:16.870+: 63533.964: [GC 63533.964: [ParNew:
471841K-52416K(471872K), 0.2251880 secs] 8733008K-8445039K(12727104K),
0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
2013-01-24T13:08:19.546+: 63536.639: [GC 63536.639: [ParNew:
461593K-52416K(471872K), 0.2812690 secs] 8854216K-8557572K(12727104K),
0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
013-01-24T13:08:21.824+: 63538.917: [GC 63538.918: [ParNew:
442836K-52416K(471872K), 0.2781490 secs] 8947992K-8705355K(12727104K),
0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
2013-01-24T13:08:22.122+: 63539.216: [GC [1 CMS-initial-mark:
8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02
sys=0.00, real=0.04 secs]
2013-01-24T13:08:22.159+: 63539.253: [CMS-concurrent-mark-start]
2013-01-24T13:08:24.953+: 63542.047: [GC 63542.047: [ParNew:
471872K-52251K(471872K), 0.1611970 secs] 9124811K-8831437K(12727104K),
0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-mark: 4.105/4.268
secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-preclean-start]
2013-01-24T13:08:26.597+: 63543.691: [CMS-concurrent-preclean:
0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
2013-01-24T13:08:26.597+: 63543.691:
[CMS-concurrent-abortable-preclean-start]
2013-01-24T13:08:27.401+: 63544.495:
[CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
sys=0.16, real=0.80 secs]
2013-01-24T13:08:27.403+: 63544.496: [GC[YG occupancy: 274458 K (471872
K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
secs]
2013-01-24T13:08:27.458+: 63544.551: [CMS-concurrent-sweep-start]
2013-01-24T13:08:27.955+: 63545.048: [GC 63545.049: [ParNew:
471707K-44566K(471872K), 0.1371770 secs] 9044714K-8701862K(12727104K),
0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
2013-01-24T13:08:29.285+: 63546.378: [GC 63546.478: [ParNew:
445714K-52416K(471872K), 0.5626120 secs] 8648805K-8410223K(12727104K),
0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
2013-01-24T13:08:32.308+: 63549.401: [GC 63549.402: [ParNew:
471872K-52416K(471872K), 0.2300560 secs] 8247804K-7976043K(12727104K),
0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
2013-01-24T13:08:34.844+: 63551.938: [GC 63551.938: [ParNew (promotion
failed): 471872K-471872K(471872K), 0.2788500 secs]63552.217:
[CMS2013-01-24T13:08:37.256+: 63554.349: [CMS-concurrent-sweep:
8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]

What might be advised here - should I up the size to 1G for the new
generation ?

Thanks
Varun


Re: GC pause issues

2013-01-24 Thread varun kumar
Hi Varun,

try to increase the heap memory.

Regards,
Varun Kumar

On Thu, Jan 24, 2013 at 11:10 PM, Varun Sharma va...@pinterest.com wrote:

 Hi,

 I have a region server which has the following logs. As you can see from
 the log, ParNew is sufficiently big (450M) and there are heavy writes going
 in. I am seeing 200ms pauses which eventually build up and there is a
 promotion failure. There is a parnew collection every 2-3 seconds so it
 fills up real fast. My memstore size is bigger 512m for flushes and 4
 regions per server. (overall size is 3G for all memstores) - I have mslab
 enabled

 2013-01-24T13:08:16.870+: 63533.964: [GC 63533.964: [ParNew:
 471841K-52416K(471872K), 0.2251880 secs] 8733008K-8445039K(12727104K),
 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
 2013-01-24T13:08:19.546+: 63536.639: [GC 63536.639: [ParNew:
 461593K-52416K(471872K), 0.2812690 secs] 8854216K-8557572K(12727104K),
 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
 013-01-24T13:08:21.824+: 63538.917: [GC 63538.918: [ParNew:
 442836K-52416K(471872K), 0.2781490 secs] 8947992K-8705355K(12727104K),
 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
 2013-01-24T13:08:22.122+: 63539.216: [GC [1 CMS-initial-mark:
 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02
 sys=0.00, real=0.04 secs]
 2013-01-24T13:08:22.159+: 63539.253: [CMS-concurrent-mark-start]
 2013-01-24T13:08:24.953+: 63542.047: [GC 63542.047: [ParNew:
 471872K-52251K(471872K), 0.1611970 secs] 9124811K-8831437K(12727104K),
 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
 2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-mark: 4.105/4.268
 secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
 2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-preclean-start]
 2013-01-24T13:08:26.597+: 63543.691: [CMS-concurrent-preclean:
 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
 2013-01-24T13:08:26.597+: 63543.691:
 [CMS-concurrent-abortable-preclean-start]
 2013-01-24T13:08:27.401+: 63544.495:
 [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
 sys=0.16, real=0.80 secs]
 2013-01-24T13:08:27.403+: 63544.496: [GC[YG occupancy: 274458 K (471872
 K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
 processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
 secs]
 2013-01-24T13:08:27.458+: 63544.551: [CMS-concurrent-sweep-start]
 2013-01-24T13:08:27.955+: 63545.048: [GC 63545.049: [ParNew:
 471707K-44566K(471872K), 0.1371770 secs] 9044714K-8701862K(12727104K),
 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
 2013-01-24T13:08:29.285+: 63546.378: [GC 63546.478: [ParNew:
 445714K-52416K(471872K), 0.5626120 secs] 8648805K-8410223K(12727104K),
 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
 2013-01-24T13:08:32.308+: 63549.401: [GC 63549.402: [ParNew:
 471872K-52416K(471872K), 0.2300560 secs] 8247804K-7976043K(12727104K),
 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
 2013-01-24T13:08:34.844+: 63551.938: [GC 63551.938: [ParNew (promotion
 failed): 471872K-471872K(471872K), 0.2788500 secs]63552.217:
 [CMS2013-01-24T13:08:37.256+: 63554.349: [CMS-concurrent-sweep:
 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]

 What might be advised here - should I up the size to 1G for the new
 generation ?

 Thanks
 Varun




-- 
Regards,
Varun Kumar.P


Re: drop table problem

2013-01-24 Thread Adrien Mogenet
Normally you wouldn't need to remove the whole /hbase root on ZK. Removing
the node which is related to your table should be enough. Restart your
master so that it will read table state again, and you'll be able to drop
your ghost table.


On Thu, Jan 24, 2013 at 2:47 PM, Kevin O'dell kevin.od...@cloudera.comwrote:

 Sorry I should have specified those are different options to try, not an
 ordered set of instructions.

 On Thu, Jan 24, 2013 at 8:47 AM, Kevin O'dell kevin.od...@cloudera.com
 wrote:

  Typically, hbck won't detect anything wrong here, as Ram said in another
  thread we really should work in this functionality.
 
  1.) Shut the HBase cluster - go to ZKcli and rmr /hbase - Start HBase
 back
  up
 
  2.) Move the table, use hbck -fixMeta -fixAssignments, restart the HBase
  (not a great option if there is data on the table)
 
  3.) Force an assign on the region for the table and see if it clears it
 up
  (Should create a new znode)
 
  4.) Go to ZK Cli and check /hbase for unassigned regions and other data
  correlating with that region and remove it, then restart HBase
 
  On Thu, Jan 24, 2013 at 8:34 AM, Vikas Jadhav vikascjadha...@gmail.com
 wrote:
 
  try to diable table first
 
  disable 'table_name'
 
  drop 'tab-name'
 
 
 
  On Thu, Jan 24, 2013 at 3:48 PM, hua beatls bea...@gmail.com wrote:
 
   HI,
   i have a table in 'transition' state, which couldn't be 'disable' or
   enable. I try to 'drop' it but failed. below is the error messages.
  
   hbase(main):012:0 drop 'T21_0513_201301_bigtable'
   ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
   org.apache.hadoop.hbase.TableNotDisabledException:
  T21_0513_201301_bigtable
   at
  
  
 
 org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1240)
   at
  
  
 
 org.apache.hadoop.hbase.master.handler.TableEventHandler.init(TableEventHandler.java:70)
   at
  
  
 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler.init(DeleteTableHandler.java:42)
   at
 org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1099)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
  
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
below is the excerpts from webui:
   Regions in
  
 
 TransitionRegionStatebd8d2bf3ef04d0f8d3dac5ca2f612f42T21_0513_201301_bigtable,2710075,1358994123350.bd8d2bf3ef04d0f8d3dac5ca2f612f42.
   state=PENDING_OPEN, ts=Thu Jan 24 16:58:34 CST 2013 (699s ago),
   server=hadoop1,60020,1358993820407
  
 
 
 
  --
  *
  *
  *
 
  Thanx and Regards*
  * Vikas Jadhav*
 
 
 
 
  --
  Kevin O'Dell
  Customer Operations Engineer, Cloudera
 



 --
 Kevin O'Dell
 Customer Operations Engineer, Cloudera




-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me


HRegionInfo was null or empty

2013-01-24 Thread Jean-Marc Spaggiari
Hi,

I'm getting this error (multiple corrurances) while running a MR which
is populating an empty table. MR is run against the 'entry' table
where I get each line and store the CRC into 'entry_crc' table.

2013-01-24 12:49:01,664 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Encountered problems when prefetch META table:
java.io.IOException: HRegionInfo was null or empty in Meta for
entry_crc, 
row=entry_crc,\x00\x00\x00\x00\xBF\xB0\xE4bluejacketsxtra.dispatch.com,99
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:170)
at 
org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:54)
at 
org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133)
at 
org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
at 
org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:365)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:130)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:105)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:933)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:988)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:875)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:846)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:746)
at 
org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82)
at 
org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162)
at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:873)
at 
org.spaggiari.mapreduce.GenerateCRC$GenerateCRCMapper.map(GenerateCRC.java:122)
at 
org.spaggiari.mapreduce.GenerateCRC$GenerateCRCMapper.map(GenerateCRC.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

I'm wondering if this is normal or not. I saw it once, so I cleaned
the table and restarted the job, but I'm seeing it again. So it seems
it's reproductible. But it's also only a WARN. Not an error...

Show I simply ignore this? Or should I dig a bit?

JM


Re: Storing images in Hbase

2013-01-24 Thread S Ahmed
Jack, out of curiosity, how many people manage the hbase related servers?

Does it require constant monitoring or its fairly hands-off now?  (or a bit
of both, early days was getting things write/learning and now its purring
along).


On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin magn...@gmail.com wrote:

 Its best to keep some RAM for caching of the filesystem, besides we
 also run datanode which takes heap as well.
 Now, please keep in mind that even if you specify heap of say 5GB, if
 your server opens threads to communicate with other systems via RPC
 (which hbase does a lot), you will indeed use HEAP +
 Nthreads*thread*kb_size.  There is a good Sun Microsystems document
 about it. (I don't have the link handy).

 -Jack



 On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma va...@pinterest.com wrote:
  Thanks for the useful information. I wonder why you use only 5G heap when
  you have an 8G machine ? Is there a reason to not use all of it (the
  DataNode typically takes a 1G of RAM)
 
  On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin magn...@gmail.com wrote:
 
  I forgot to mention that I also have this setup:
 
  property
namehbase.hregion.memstore.flush.size/name
value33554432/value
descriptionFlush more often. Default: 67108864/description
  /property
 
  This parameter works on per region amount, so this means if any of my
  400 (currently) regions on a regionserver has 30MB+ in memstore, the
  hbase will flush it to disk.
 
 
  Here are some metrics from a regionserver:
 
  requests=2, regions=370, stores=370, storefiles=1390,
  storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0,
  flushQueueSize=0, usedHeap=3516, maxHeap=4987,
  blockCacheSize=790656256, blockCacheFree=255245888,
  blockCacheCount=2436, blockCacheHitCount=218015828,
  blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
  blockCacheHitRatio=94, blockCacheHitCachingRatio=98
 
  Note, that memstore is only 2G, this particular regionserver HEAP is set
  to 5G.
 
  And last but not least, its very important to have good GC setup:
 
  export HBASE_OPTS=$HBASE_OPTS -verbose:gc -Xms5000m
  -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
  -XX:+PrintGCDateStamps
  -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
  -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
  -XX:+UseParNewGC \
  -XX:NewSize=128m -XX:MaxNewSize=128m \
  -XX:-UseAdaptiveSizePolicy \
  -XX:+CMSParallelRemarkEnabled \
  -XX:-TraceClassUnloading
  
 
  -Jack
 
  On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma va...@pinterest.com
 wrote:
   Hey Jack,
  
   Thanks for the useful information. By flush size being 15 %, do you
 mean
   the memstore flush size ? 15 % would mean close to 1G, have you seen
 any
   issues with flushes taking too long ?
  
   Thanks
   Varun
  
   On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin magn...@gmail.com
 wrote:
  
   That's right, Memstore size , not flush size is increased.  Filesize
 is
   10G. Overall write cache is 60% of heap and read cache is 20%.  Flush
  size
   is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that
  can
   be promoted.  On the way to hbase images are written to a queue, so
  that we
   can take Hbase down for maintenance and still do inserts later.
   ImageShack
   has ‘perma cache’ servers that allows writes and serving of data even
  when
   hbase is down for hours, consider it 4th replica  outside of hadoop
  
   Jack
  
*From:* Mohit Anchlia mohitanch...@gmail.com
   *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
   *To:* user@hbase.apache.org
   *Subject:* Re: Storing images in Hbase
  
   Thanks Jack for sharing this information. This definitely makes sense
  when
   using the type of caching layer. You mentioned about increasing write
   cache, I am assuming you had to increase the following parameters in
   addition to increase the memstore size:
  
   hbase.hregion.max.filesize
   hbase.hregion.memstore.flush.size
  
   On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin magn...@gmail.com
 wrote:
  
We buffer all accesses to HBASE with Varnish SSD based caching
 layer.
So the impact for reads is negligible.  We have 70 node cluster, 8
 GB
of RAM per node, relatively weak nodes (intel core 2 duo), with
10-12TB per server of disks.  Inserting 600,000 images per day.  We
have relatively little of compaction activity as we made our write
cache much larger than read cache - so we don't experience region
 file
fragmentation as much.
   
-Jack
   
On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia 
  mohitanch...@gmail.com
wrote:
 I think it really depends on volume of the traffic, data
  distribution
   per
 region, how and when files compaction occurs, number of nodes in
 the
 cluster. In my experience when it comes to blob data where you
 are
serving
 10s of thousand+ requests/sec writes and reads then it's very
  difficult
to
 manage HBase without very hard operations and 

Re: Join Using MapReduce and Hbase

2013-01-24 Thread Rob Roland
The O'Reilly book, MapReduce Design Patterns also covers joins. It's
pretty easy to follow and it gives some good examples. This doesn't cover
the HBase use case, but if you understand how to do a basic set of joins in
map/reduce, you can apply HBase to that model.


On Thu, Jan 24, 2013 at 5:35 AM, Doug Meil doug.m...@explorysmedical.comwrote:


 Hi there-

 Here is a comment in the RefGuide on joins in the HBase data model.

 http://hbase.apache.org/book.html#joins

 Short answer, you need to do it yourself (e.g., either with an in-memory
 hashmap or instantiating an HTable of the other table, depending on your
 situation).

 For other MR examples, see this...

 http://hbase.apache.org/book.html#mapreduce.example




 On 1/24/13 8:19 AM, Vikas Jadhav vikascjadha...@gmail.com wrote:

 Hi I am working join operation using MapReduce
 So if anyone has useful information plz share it.
 Example Code or New Technique along with existing one.
 Thank You.
 --
 *
 *
 *
 
 Thanx and Regards*
 * Vikas Jadhav*





Re: HRegionInfo was null or empty

2013-01-24 Thread Ted Yu
bq. Encountered problems when prefetch META table:

You can ignore the warning.

Cheers

On Thu, Jan 24, 2013 at 1:38 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi,

 I'm getting this error (multiple corrurances) while running a MR which
 is populating an empty table. MR is run against the 'entry' table
 where I get each line and store the CRC into 'entry_crc' table.

 2013-01-24 12:49:01,664 WARN

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
 Encountered problems when prefetch META table:
 java.io.IOException: HRegionInfo was null or empty in Meta for
 entry_crc, row=entry_crc,\x00\x00\x00\x00\xBF\xB0\xE4
 bluejacketsxtra.dispatch.com,99
 at
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:170)
 at
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:54)
 at
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133)
 at
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:365)
 at
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:130)
 at
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:105)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:933)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:988)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:875)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:846)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:746)
 at
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82)
 at
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162)
 at
 org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:873)
 at
 org.spaggiari.mapreduce.GenerateCRC$GenerateCRCMapper.map(GenerateCRC.java:122)
 at
 org.spaggiari.mapreduce.GenerateCRC$GenerateCRCMapper.map(GenerateCRC.java:1)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)

 I'm wondering if this is normal or not. I saw it once, so I cleaned
 the table and restarted the job, but I'm seeing it again. So it seems
 it's reproductible. But it's also only a WARN. Not an error...

 Show I simply ignore this? Or should I dig a bit?

 JM



Re: HRegionInfo was null or empty

2013-01-24 Thread Jean-Marc Spaggiari
Perfect, thanks. I will.

JM

2013/1/24, Ted Yu yuzhih...@gmail.com:
 bq. Encountered problems when prefetch META table:

 You can ignore the warning.

 Cheers

 On Thu, Jan 24, 2013 at 1:38 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi,

 I'm getting this error (multiple corrurances) while running a MR which
 is populating an empty table. MR is run against the 'entry' table
 where I get each line and store the CRC into 'entry_crc' table.

 2013-01-24 12:49:01,664 WARN

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
 Encountered problems when prefetch META table:
 java.io.IOException: HRegionInfo was null or empty in Meta for
 entry_crc, row=entry_crc,\x00\x00\x00\x00\xBF\xB0\xE4
 bluejacketsxtra.dispatch.com,99
 at
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:170)
 at
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:54)
 at
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133)
 at
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:365)
 at
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:130)
 at
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:105)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:933)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:988)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:875)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:846)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:746)
 at
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:82)
 at
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:162)
 at
 org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:873)
 at
 org.spaggiari.mapreduce.GenerateCRC$GenerateCRCMapper.map(GenerateCRC.java:122)
 at
 org.spaggiari.mapreduce.GenerateCRC$GenerateCRCMapper.map(GenerateCRC.java:1)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)

 I'm wondering if this is normal or not. I saw it once, so I cleaned
 the table and restarted the job, but I'm seeing it again. So it seems
 it's reproductible. But it's also only a WARN. Not an error...

 Show I simply ignore this? Or should I dig a bit?

 JM




答复: GC pause issues

2013-01-24 Thread 谢良
Hi Varun,

Please note if you try to increase new generation size, then the ParNew time 
will be up accordingly, and CMS YGC is also a STW.
could you have a try to reduce memstore size to a smaller value, e.g. 128m or 
256m ?

Regards,
Liang

发件人: Varun Sharma [va...@pinterest.com]
发送时间: 2013年1月25日 1:40
收件人: user@hbase.apache.org
主题: GC pause issues

Hi,

I have a region server which has the following logs. As you can see from
the log, ParNew is sufficiently big (450M) and there are heavy writes going
in. I am seeing 200ms pauses which eventually build up and there is a
promotion failure. There is a parnew collection every 2-3 seconds so it
fills up real fast. My memstore size is bigger 512m for flushes and 4
regions per server. (overall size is 3G for all memstores) - I have mslab
enabled

2013-01-24T13:08:16.870+: 63533.964: [GC 63533.964: [ParNew:
471841K-52416K(471872K), 0.2251880 secs] 8733008K-8445039K(12727104K),
0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
2013-01-24T13:08:19.546+: 63536.639: [GC 63536.639: [ParNew:
461593K-52416K(471872K), 0.2812690 secs] 8854216K-8557572K(12727104K),
0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
013-01-24T13:08:21.824+: 63538.917: [GC 63538.918: [ParNew:
442836K-52416K(471872K), 0.2781490 secs] 8947992K-8705355K(12727104K),
0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
2013-01-24T13:08:22.122+: 63539.216: [GC [1 CMS-initial-mark:
8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02
sys=0.00, real=0.04 secs]
2013-01-24T13:08:22.159+: 63539.253: [CMS-concurrent-mark-start]
2013-01-24T13:08:24.953+: 63542.047: [GC 63542.047: [ParNew:
471872K-52251K(471872K), 0.1611970 secs] 9124811K-8831437K(12727104K),
0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-mark: 4.105/4.268
secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-preclean-start]
2013-01-24T13:08:26.597+: 63543.691: [CMS-concurrent-preclean:
0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
2013-01-24T13:08:26.597+: 63543.691:
[CMS-concurrent-abortable-preclean-start]
2013-01-24T13:08:27.401+: 63544.495:
[CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
sys=0.16, real=0.80 secs]
2013-01-24T13:08:27.403+: 63544.496: [GC[YG occupancy: 274458 K (471872
K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
secs]
2013-01-24T13:08:27.458+: 63544.551: [CMS-concurrent-sweep-start]
2013-01-24T13:08:27.955+: 63545.048: [GC 63545.049: [ParNew:
471707K-44566K(471872K), 0.1371770 secs] 9044714K-8701862K(12727104K),
0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
2013-01-24T13:08:29.285+: 63546.378: [GC 63546.478: [ParNew:
445714K-52416K(471872K), 0.5626120 secs] 8648805K-8410223K(12727104K),
0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
2013-01-24T13:08:32.308+: 63549.401: [GC 63549.402: [ParNew:
471872K-52416K(471872K), 0.2300560 secs] 8247804K-7976043K(12727104K),
0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
2013-01-24T13:08:34.844+: 63551.938: [GC 63551.938: [ParNew (promotion
failed): 471872K-471872K(471872K), 0.2788500 secs]63552.217:
[CMS2013-01-24T13:08:37.256+: 63554.349: [CMS-concurrent-sweep:
8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]

What might be advised here - should I up the size to 1G for the new
generation ?

Thanks
Varun


RE: Region server Memory Use is double the -Xmx setting

2013-01-24 Thread Buckley,Ron
Anoop,

We use Snappy compression for all our tables.

I tried '-XX:MaxDirectMemorySize=1g' on our test cluster last night. 

Ran great until major compactions started, then all region servers took 
'java.lang.OutOfMemoryError: Direct buffer memory'

Trying '-XX:MaxDirectMemorySize=2g' tonight.


Ron

-Original Message-
From: Anoop Sam John [mailto:anoo...@huawei.com] 
Sent: Wednesday, January 23, 2013 10:22 PM
To: user@hbase.apache.org
Subject: RE: Region server Memory Use is double the -Xmx setting

Are  you using compression for HFiles?

Yes we are using  MaxDirectMemorySize and we dont use off-heap cache.

-Anoop-

From: Buckley,Ron [buckl...@oclc.org]
Sent: Wednesday, January 23, 2013 8:49 PM
To: user@hbase.apache.org
Subject: RE: Region server Memory Use is double the -Xmx setting

Liang,

Thanks.  I wasn’t really aware that the direct memory could get that large. 
(Full disclosure, we did switch from jdk1.6.0_25 to jdk1.6.0_31 the last time 
we restarted HBase.)

I've only seen explicit setting of -XX:MaxDirectMemorySize for regionservers 
associated with the experimental off-heap cache.

Is anyone else running their region servers with -XX:MaxDirectMemorySize (not 
using the off-heap cache)?

Ron

-Original Message-
From: 谢良 [mailto:xieli...@xiaomi.com]
Sent: Tuesday, January 22, 2013 9:20 PM
To: user@hbase.apache.org
Subject: 答复: Region server Memory Use is double the -Xmx setting

Please set -XX:MaxDirectMemorySize explicitly,  else the default is taking 
the value like -Xmx in currenty JDK6, at least for jdk1.6.30+

Best Regards,
Liang

发件人: Buckley,Ron [buckl...@oclc.org]
发送时间: 2013年1月23日 5:17
收件人: user@hbase.apache.org
主题: Region server Memory Use is double the -Xmx setting

We have a 50 node cluster replicating to a 6 node cluster. Both clusters are 
running CDH4.1.2 and HBase 0.94.2.



Today we noticed that the region servers at our replica site are using 10GB 
more memory than the '-Xmx12288m' we have defined in hbase-env.sh



These region servers have been up since January 9, 2013.



Does anyone have suggestions about tracking down this additional memory use?



I'm not necessarily expecting the Region Server to stay right at the 12GB that 
we allocated, but having it running at 24GB is starting to cause the servers to 
swap.






PIDUSER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND


28544  prodcon   18   0 24.1g  23g  18m S 20.0 74.0   9071:34 java




28544:   /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9
%p -Xmx12288m -Dcom.sun.management.jmxr

emote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=9021 -ea

-server -XX:+HeapDumpOnOutOfMemoryError -Xmn256m -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOc

cupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:/drive1/hadoop/2.0/isoft/../

hbase/logs/gc-hbase.log -ea -server -XX:+HeapDumpOnOutOfMemoryError -Xmn256m -XX





--

Ron Buckley

x6365

http://intranet-wiki.oclc.org/wiki/XWC/XServe/SRU



http://intranet-wiki.oclc.org/wiki/FIND



http://intranet-wiki.oclc.org/wiki/Higgins


http://intranet-wiki.oclc.org/wiki/Firefly



Re: 答复: GC pause issues

2013-01-24 Thread Varun Sharma
I am curious how reducing the memstore size would help - I have 6 regions
and 3G total for memstore - so I would like max out on that by having a
bigger flush size per region. Are you asking me to have more regions and
smaller memstore flush size instead ? How is that likely to help

On Thu, Jan 24, 2013 at 6:07 PM, 谢良 xieli...@xiaomi.com wrote:

 Hi Varun,

 Please note if you try to increase new generation size, then the ParNew
 time will be up accordingly, and CMS YGC is also a STW.
 could you have a try to reduce memstore size to a smaller value, e.g. 128m
 or 256m ?

 Regards,
 Liang
 
 发件人: Varun Sharma [va...@pinterest.com]
 发送时间: 2013年1月25日 1:40
 收件人: user@hbase.apache.org
 主题: GC pause issues

 Hi,

 I have a region server which has the following logs. As you can see from
 the log, ParNew is sufficiently big (450M) and there are heavy writes going
 in. I am seeing 200ms pauses which eventually build up and there is a
 promotion failure. There is a parnew collection every 2-3 seconds so it
 fills up real fast. My memstore size is bigger 512m for flushes and 4
 regions per server. (overall size is 3G for all memstores) - I have mslab
 enabled

 2013-01-24T13:08:16.870+: 63533.964: [GC 63533.964: [ParNew:
 471841K-52416K(471872K), 0.2251880 secs] 8733008K-8445039K(12727104K),
 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
 2013-01-24T13:08:19.546+: 63536.639: [GC 63536.639: [ParNew:
 461593K-52416K(471872K), 0.2812690 secs] 8854216K-8557572K(12727104K),
 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
 013-01-24T13:08:21.824+: 63538.917: [GC 63538.918: [ParNew:
 442836K-52416K(471872K), 0.2781490 secs] 8947992K-8705355K(12727104K),
 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
 2013-01-24T13:08:22.122+: 63539.216: [GC [1 CMS-initial-mark:
 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02
 sys=0.00, real=0.04 secs]
 2013-01-24T13:08:22.159+: 63539.253: [CMS-concurrent-mark-start]
 2013-01-24T13:08:24.953+: 63542.047: [GC 63542.047: [ParNew:
 471872K-52251K(471872K), 0.1611970 secs] 9124811K-8831437K(12727104K),
 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
 2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-mark: 4.105/4.268
 secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
 2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-preclean-start]
 2013-01-24T13:08:26.597+: 63543.691: [CMS-concurrent-preclean:
 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
 2013-01-24T13:08:26.597+: 63543.691:
 [CMS-concurrent-abortable-preclean-start]
 2013-01-24T13:08:27.401+: 63544.495:
 [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
 sys=0.16, real=0.80 secs]
 2013-01-24T13:08:27.403+: 63544.496: [GC[YG occupancy: 274458 K (471872
 K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
 processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
 secs]
 2013-01-24T13:08:27.458+: 63544.551: [CMS-concurrent-sweep-start]
 2013-01-24T13:08:27.955+: 63545.048: [GC 63545.049: [ParNew:
 471707K-44566K(471872K), 0.1371770 secs] 9044714K-8701862K(12727104K),
 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
 2013-01-24T13:08:29.285+: 63546.378: [GC 63546.478: [ParNew:
 445714K-52416K(471872K), 0.5626120 secs] 8648805K-8410223K(12727104K),
 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
 2013-01-24T13:08:32.308+: 63549.401: [GC 63549.402: [ParNew:
 471872K-52416K(471872K), 0.2300560 secs] 8247804K-7976043K(12727104K),
 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
 2013-01-24T13:08:34.844+: 63551.938: [GC 63551.938: [ParNew (promotion
 failed): 471872K-471872K(471872K), 0.2788500 secs]63552.217:
 [CMS2013-01-24T13:08:37.256+: 63554.349: [CMS-concurrent-sweep:
 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]

 What might be advised here - should I up the size to 1G for the new
 generation ?

 Thanks
 Varun



RE: paging results filter

2013-01-24 Thread Anoop Sam John
@Toby

You mean to say that you need a mechanism for directly jumping to a page. Say 
you are in page#1 (1-20) now and you want to jump to page#4(61-80).. Yes this 
is not there in PageFilter...
The normal way of next page , next page will work fine as within the server the 
next() calls on the scanner works this way...

-Anoop-

From: Toby Lazar [tla...@gmail.com]
Sent: Thursday, January 24, 2013 6:44 PM
To: user@hbase.apache.org
Subject: Re: paging results filter

I don't see a way of specifying which page of resluts I want.  For example,
if I want page 3 with page size of 20 (only results 41-60), I don't see how
PageFilter can be configued for that.  Am I missing the obvious?

Thanks,

Toby

On Thu, Jan 24, 2013 at 7:52 AM, Mohammad Tariq donta...@gmail.com wrote:

 I think you need
 PageFilter
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
 
 .

 HTH

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Thu, Jan 24, 2013 at 6:20 PM, Toby Lazar tla...@gmail.com wrote:

  Hi,
 
  I need to create a client function that allows paging of scan results
  (initially return results 1-20, then click on page to to show results
  21-40, 41-60, etc.) without needing to remember the start rowkey.  I
  beleive that a filter would be far more efficient than implementing the
  logic client-side.  I couldn't find any OOTB filter for this
 functionality
  so I wrote the class below.  It seems to work fine for me, but can anyone
  comment if this approach makes sense?  Is there another OOTB filter that
 I
  can use instead?
 
  Thank you,
 
  Toby
 
 
 
  import java.io.DataInput;
  import java.io.DataOutput;
  import java.io.IOException;
  import org.apache.hadoop.hbase.filter.FilterBase;
  public class PageOffsetFilter extends FilterBase {
   private long startRowCount;
   private long endRowCount;
 
   private int count = 0;
   public PageOffsetFilter() {
   }
 
   public PageOffsetFilter(long pageNumber, long pageSize) {
 
if(pageNumber1)
 pageNumber=1;
 
startRowCount = (pageNumber - 1) * pageSize;
endRowCount = (pageSize * pageNumber)-1;
   }
   @Override
   public boolean filterAllRemaining() {
return count  endRowCount;
   }
   @Override
   public boolean filterRow() {
 
count++;
if(count = startRowCount) {
 return true;
} else {
 return false;
}
 
   }
 
   @Override
   public void readFields(DataInput dataInput) throws IOException {
 
this.startRowCount = dataInput.readLong();
this.endRowCount = dataInput.readLong();
   }
   @Override
   public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeLong(startRowCount);
dataOutput.writeLong(endRowCount);
   }
 
  }
 


Re: paging results filter

2013-01-24 Thread ramkrishna vasudevan
@Toby

If you wish to go the specified page you need to set the start row that
needs to come as part of that page.
So what i feel is implement a custom page filter and keep doing next() and
display only those records that suits the page you clicked.
 and send them back to the client.  Anyway the logic inside the filter
should keep track of the number of records that passed by till you reach
your concerned page and that should
be  based on the number of records on a page.

Regards
Ram

On Fri, Jan 25, 2013 at 9:04 AM, Anoop Sam John anoo...@huawei.com wrote:

 @Toby

 You mean to say that you need a mechanism for directly jumping to a page.
 Say you are in page#1 (1-20) now and you want to jump to page#4(61-80)..
 Yes this is not there in PageFilter...
 The normal way of next page , next page will work fine as within the
 server the next() calls on the scanner works this way...

 -Anoop-
 
 From: Toby Lazar [tla...@gmail.com]
 Sent: Thursday, January 24, 2013 6:44 PM
 To: user@hbase.apache.org
 Subject: Re: paging results filter

 I don't see a way of specifying which page of resluts I want.  For example,
 if I want page 3 with page size of 20 (only results 41-60), I don't see how
 PageFilter can be configued for that.  Am I missing the obvious?

 Thanks,

 Toby

 On Thu, Jan 24, 2013 at 7:52 AM, Mohammad Tariq donta...@gmail.com
 wrote:

  I think you need
  PageFilter
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
  
  .
 
  HTH
 
  Warm Regards,
  Tariq
  https://mtariq.jux.com/
  cloudfront.blogspot.com
 
 
  On Thu, Jan 24, 2013 at 6:20 PM, Toby Lazar tla...@gmail.com wrote:
 
   Hi,
  
   I need to create a client function that allows paging of scan results
   (initially return results 1-20, then click on page to to show results
   21-40, 41-60, etc.) without needing to remember the start rowkey.  I
   beleive that a filter would be far more efficient than implementing the
   logic client-side.  I couldn't find any OOTB filter for this
  functionality
   so I wrote the class below.  It seems to work fine for me, but can
 anyone
   comment if this approach makes sense?  Is there another OOTB filter
 that
  I
   can use instead?
  
   Thank you,
  
   Toby
  
  
  
   import java.io.DataInput;
   import java.io.DataOutput;
   import java.io.IOException;
   import org.apache.hadoop.hbase.filter.FilterBase;
   public class PageOffsetFilter extends FilterBase {
private long startRowCount;
private long endRowCount;
  
private int count = 0;
public PageOffsetFilter() {
}
  
public PageOffsetFilter(long pageNumber, long pageSize) {
  
 if(pageNumber1)
  pageNumber=1;
  
 startRowCount = (pageNumber - 1) * pageSize;
 endRowCount = (pageSize * pageNumber)-1;
}
@Override
public boolean filterAllRemaining() {
 return count  endRowCount;
}
@Override
public boolean filterRow() {
  
 count++;
 if(count = startRowCount) {
  return true;
 } else {
  return false;
 }
  
}
  
@Override
public void readFields(DataInput dataInput) throws IOException {
  
 this.startRowCount = dataInput.readLong();
 this.endRowCount = dataInput.readLong();
}
@Override
public void write(DataOutput dataOutput) throws IOException {
 dataOutput.writeLong(startRowCount);
 dataOutput.writeLong(endRowCount);
}
  
   }
  
 



Pagination with HBase - getting previous page of data

2013-01-24 Thread Vijay Ganesan
I'm displaying rows of data from a HBase table in a data grid UI. The grid
shows 25 rows at a time i.e. it is paginated. User can click on
Next/Previous to paginate through the data 25 rows at a time. I can
implement Next easily by setting a HBase
org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's
row that is sent to the UI with the previous batch. However, I can't seem
to be able to do the same with Previous. I can set the endRow on the Scan
to be the row id of the last row of the previous batch but since HBase
Scans are always in the forward direction, there is no way to set a
PageFilter that can get 25 rows ending at a particular row. The only option
seems to be to get *all* rows up to the end row and filter out all but the
last 25 in the caller, which seems very inefficient. Any ideas on how this
can be done efficiently?

-- 
-Vijay


Re: 答复: GC pause issues

2013-01-24 Thread Varun Sharma
I do have significant block cache churn and this issue is typical
correlated with a huge increase in read latencies - could that be the
reason for this - mslab should be taking care of the memstore related heap
fragmentation ? Has anyone seen issues with block cache churn ?

On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma va...@pinterest.com wrote:

 I am curious how reducing the memstore size would help - I have 6 regions
 and 3G total for memstore - so I would like max out on that by having a
 bigger flush size per region. Are you asking me to have more regions and
 smaller memstore flush size instead ? How is that likely to help


 On Thu, Jan 24, 2013 at 6:07 PM, 谢良 xieli...@xiaomi.com wrote:

 Hi Varun,

 Please note if you try to increase new generation size, then the ParNew
 time will be up accordingly, and CMS YGC is also a STW.
 could you have a try to reduce memstore size to a smaller value, e.g.
 128m or 256m ?

 Regards,
 Liang
 
 发件人: Varun Sharma [va...@pinterest.com]
 发送时间: 2013年1月25日 1:40
 收件人: user@hbase.apache.org
 主题: GC pause issues

 Hi,

 I have a region server which has the following logs. As you can see from
 the log, ParNew is sufficiently big (450M) and there are heavy writes
 going
 in. I am seeing 200ms pauses which eventually build up and there is a
 promotion failure. There is a parnew collection every 2-3 seconds so it
 fills up real fast. My memstore size is bigger 512m for flushes and 4
 regions per server. (overall size is 3G for all memstores) - I have mslab
 enabled

 2013-01-24T13:08:16.870+: 63533.964: [GC 63533.964: [ParNew:
 471841K-52416K(471872K), 0.2251880 secs] 8733008K-8445039K(12727104K),
 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
 2013-01-24T13:08:19.546+: 63536.639: [GC 63536.639: [ParNew:
 461593K-52416K(471872K), 0.2812690 secs] 8854216K-8557572K(12727104K),
 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
 013-01-24T13:08:21.824+: 63538.917: [GC 63538.918: [ParNew:
 442836K-52416K(471872K), 0.2781490 secs] 8947992K-8705355K(12727104K),
 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
 2013-01-24T13:08:22.122+: 63539.216: [GC [1 CMS-initial-mark:
 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
 user=0.02
 sys=0.00, real=0.04 secs]
 2013-01-24T13:08:22.159+: 63539.253: [CMS-concurrent-mark-start]
 2013-01-24T13:08:24.953+: 63542.047: [GC 63542.047: [ParNew:
 471872K-52251K(471872K), 0.1611970 secs] 9124811K-8831437K(12727104K),
 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
 2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-mark: 4.105/4.268
 secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
 2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-preclean-start]
 2013-01-24T13:08:26.597+: 63543.691: [CMS-concurrent-preclean:
 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
 2013-01-24T13:08:26.597+: 63543.691:
 [CMS-concurrent-abortable-preclean-start]
 2013-01-24T13:08:27.401+: 63544.495:
 [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
 sys=0.16, real=0.80 secs]
 2013-01-24T13:08:27.403+: 63544.496: [GC[YG occupancy: 274458 K
 (471872
 K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
 processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
 secs]
 2013-01-24T13:08:27.458+: 63544.551: [CMS-concurrent-sweep-start]
 2013-01-24T13:08:27.955+: 63545.048: [GC 63545.049: [ParNew:
 471707K-44566K(471872K), 0.1371770 secs] 9044714K-8701862K(12727104K),
 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
 2013-01-24T13:08:29.285+: 63546.378: [GC 63546.478: [ParNew:
 445714K-52416K(471872K), 0.5626120 secs] 8648805K-8410223K(12727104K),
 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
 2013-01-24T13:08:32.308+: 63549.401: [GC 63549.402: [ParNew:
 471872K-52416K(471872K), 0.2300560 secs] 8247804K-7976043K(12727104K),
 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
 2013-01-24T13:08:34.844+: 63551.938: [GC 63551.938: [ParNew (promotion
 failed): 471872K-471872K(471872K), 0.2788500 secs]63552.217:
 [CMS2013-01-24T13:08:37.256+: 63554.349: [CMS-concurrent-sweep:
 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]

 What might be advised here - should I up the size to 1G for the new
 generation ?

 Thanks
 Varun





Re: Pagination with HBase - getting previous page of data

2013-01-24 Thread Mohammad Tariq
Hello sir,

  While paging through, store the startkey of the current page of 25
rows
in a separate byte[]. Now, if you want to come back to this page when you
are at the next page do a range query where  startkey would be the rowkey
you had stored earlier and the endkey would be the startrowkey  of  current
page. You have to store just one rowkey each time you show a page using
which you could come back to this page when you are at the next page.

However, this approach will fail in a case where your user would like to go
to a particular previous page.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan vi...@scaligent.com wrote:

 I'm displaying rows of data from a HBase table in a data grid UI. The grid
 shows 25 rows at a time i.e. it is paginated. User can click on
 Next/Previous to paginate through the data 25 rows at a time. I can
 implement Next easily by setting a HBase
 org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
 org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's
 row that is sent to the UI with the previous batch. However, I can't seem
 to be able to do the same with Previous. I can set the endRow on the Scan
 to be the row id of the last row of the previous batch but since HBase
 Scans are always in the forward direction, there is no way to set a
 PageFilter that can get 25 rows ending at a particular row. The only option
 seems to be to get *all* rows up to the end row and filter out all but the
 last 25 in the caller, which seems very inefficient. Any ideas on how this
 can be done efficiently?

 --
 -Vijay



Re: paging results filter

2013-01-24 Thread Mohammad Tariq
Hello Toby,

  Sorry for the late reply. But, you have got appropriate answers from
the pros :)

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Fri, Jan 25, 2013 at 9:45 AM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 @Toby

 If you wish to go the specified page you need to set the start row that
 needs to come as part of that page.
 So what i feel is implement a custom page filter and keep doing next() and
 display only those records that suits the page you clicked.
  and send them back to the client.  Anyway the logic inside the filter
 should keep track of the number of records that passed by till you reach
 your concerned page and that should
 be  based on the number of records on a page.

 Regards
 Ram

 On Fri, Jan 25, 2013 at 9:04 AM, Anoop Sam John anoo...@huawei.com
 wrote:

  @Toby
 
  You mean to say that you need a mechanism for directly jumping to a page.
  Say you are in page#1 (1-20) now and you want to jump to page#4(61-80)..
  Yes this is not there in PageFilter...
  The normal way of next page , next page will work fine as within the
  server the next() calls on the scanner works this way...
 
  -Anoop-
  
  From: Toby Lazar [tla...@gmail.com]
  Sent: Thursday, January 24, 2013 6:44 PM
  To: user@hbase.apache.org
  Subject: Re: paging results filter
 
  I don't see a way of specifying which page of resluts I want.  For
 example,
  if I want page 3 with page size of 20 (only results 41-60), I don't see
 how
  PageFilter can be configued for that.  Am I missing the obvious?
 
  Thanks,
 
  Toby
 
  On Thu, Jan 24, 2013 at 7:52 AM, Mohammad Tariq donta...@gmail.com
  wrote:
 
   I think you need
   PageFilter
  
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
   
   .
  
   HTH
  
   Warm Regards,
   Tariq
   https://mtariq.jux.com/
   cloudfront.blogspot.com
  
  
   On Thu, Jan 24, 2013 at 6:20 PM, Toby Lazar tla...@gmail.com wrote:
  
Hi,
   
I need to create a client function that allows paging of scan results
(initially return results 1-20, then click on page to to show results
21-40, 41-60, etc.) without needing to remember the start rowkey.  I
beleive that a filter would be far more efficient than implementing
 the
logic client-side.  I couldn't find any OOTB filter for this
   functionality
so I wrote the class below.  It seems to work fine for me, but can
  anyone
comment if this approach makes sense?  Is there another OOTB filter
  that
   I
can use instead?
   
Thank you,
   
Toby
   
   
   
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.hbase.filter.FilterBase;
public class PageOffsetFilter extends FilterBase {
 private long startRowCount;
 private long endRowCount;
   
 private int count = 0;
 public PageOffsetFilter() {
 }
   
 public PageOffsetFilter(long pageNumber, long pageSize) {
   
  if(pageNumber1)
   pageNumber=1;
   
  startRowCount = (pageNumber - 1) * pageSize;
  endRowCount = (pageSize * pageNumber)-1;
 }
 @Override
 public boolean filterAllRemaining() {
  return count  endRowCount;
 }
 @Override
 public boolean filterRow() {
   
  count++;
  if(count = startRowCount) {
   return true;
  } else {
   return false;
  }
   
 }
   
 @Override
 public void readFields(DataInput dataInput) throws IOException {
   
  this.startRowCount = dataInput.readLong();
  this.endRowCount = dataInput.readLong();
 }
 @Override
 public void write(DataOutput dataOutput) throws IOException {
  dataOutput.writeLong(startRowCount);
  dataOutput.writeLong(endRowCount);
 }
   
}
   
  
 



Re: Storing images in Hbase

2013-01-24 Thread Jack Levin
Two people including myself, its fairly hands off. Took about 3 months to
tune it right, however we did have had multiple years of experience with
datanodes and hadoop in general, so that was a good boost.

We have 4 hbase clusters today, image store being largest
On Jan 24, 2013 2:14 PM, S Ahmed sahmed1...@gmail.com wrote:

 Jack, out of curiosity, how many people manage the hbase related servers?

 Does it require constant monitoring or its fairly hands-off now?  (or a bit
 of both, early days was getting things write/learning and now its purring
 along).


 On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin magn...@gmail.com wrote:

  Its best to keep some RAM for caching of the filesystem, besides we
  also run datanode which takes heap as well.
  Now, please keep in mind that even if you specify heap of say 5GB, if
  your server opens threads to communicate with other systems via RPC
  (which hbase does a lot), you will indeed use HEAP +
  Nthreads*thread*kb_size.  There is a good Sun Microsystems document
  about it. (I don't have the link handy).
 
  -Jack
 
 
 
  On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma va...@pinterest.com
 wrote:
   Thanks for the useful information. I wonder why you use only 5G heap
 when
   you have an 8G machine ? Is there a reason to not use all of it (the
   DataNode typically takes a 1G of RAM)
  
   On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin magn...@gmail.com
 wrote:
  
   I forgot to mention that I also have this setup:
  
   property
 namehbase.hregion.memstore.flush.size/name
 value33554432/value
 descriptionFlush more often. Default: 67108864/description
   /property
  
   This parameter works on per region amount, so this means if any of my
   400 (currently) regions on a regionserver has 30MB+ in memstore, the
   hbase will flush it to disk.
  
  
   Here are some metrics from a regionserver:
  
   requests=2, regions=370, stores=370, storefiles=1390,
   storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0,
   flushQueueSize=0, usedHeap=3516, maxHeap=4987,
   blockCacheSize=790656256, blockCacheFree=255245888,
   blockCacheCount=2436, blockCacheHitCount=218015828,
   blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
   blockCacheHitRatio=94, blockCacheHitCachingRatio=98
  
   Note, that memstore is only 2G, this particular regionserver HEAP is
 set
   to 5G.
  
   And last but not least, its very important to have good GC setup:
  
   export HBASE_OPTS=$HBASE_OPTS -verbose:gc -Xms5000m
   -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
   -XX:+PrintGCDateStamps
   -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log
 \
   -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
   -XX:+UseParNewGC \
   -XX:NewSize=128m -XX:MaxNewSize=128m \
   -XX:-UseAdaptiveSizePolicy \
   -XX:+CMSParallelRemarkEnabled \
   -XX:-TraceClassUnloading
   
  
   -Jack
  
   On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma va...@pinterest.com
  wrote:
Hey Jack,
   
Thanks for the useful information. By flush size being 15 %, do you
  mean
the memstore flush size ? 15 % would mean close to 1G, have you seen
  any
issues with flushes taking too long ?
   
Thanks
Varun
   
On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin magn...@gmail.com
  wrote:
   
That's right, Memstore size , not flush size is increased.
  Filesize
  is
10G. Overall write cache is 60% of heap and read cache is 20%.
  Flush
   size
is 15%.  64 maxlogs at 128MB. One namenode server, one secondary
 that
   can
be promoted.  On the way to hbase images are written to a queue, so
   that we
can take Hbase down for maintenance and still do inserts later.
ImageShack
has ‘perma cache’ servers that allows writes and serving of data
 even
   when
hbase is down for hours, consider it 4th replica  outside of
 hadoop
   
Jack
   
 *From:* Mohit Anchlia mohitanch...@gmail.com
*Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
*To:* user@hbase.apache.org
*Subject:* Re: Storing images in Hbase
   
Thanks Jack for sharing this information. This definitely makes
 sense
   when
using the type of caching layer. You mentioned about increasing
 write
cache, I am assuming you had to increase the following parameters
 in
addition to increase the memstore size:
   
hbase.hregion.max.filesize
hbase.hregion.memstore.flush.size
   
On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin magn...@gmail.com
  wrote:
   
 We buffer all accesses to HBASE with Varnish SSD based caching
  layer.
 So the impact for reads is negligible.  We have 70 node cluster,
 8
  GB
 of RAM per node, relatively weak nodes (intel core 2 duo), with
 10-12TB per server of disks.  Inserting 600,000 images per day.
  We
 have relatively little of compaction activity as we made our
 write
 cache much larger than read cache - so we don't experience region
  file
 fragmentation as much.

 -Jack

 

Re: 答复: GC pause issues

2013-01-24 Thread Jack Levin
Generally, the larger the flush to harder the GC will work. Flush more
often to avoid this. What is your total heap size set at?
On Jan 24, 2013 9:02 PM, Varun Sharma va...@pinterest.com wrote:

 I do have significant block cache churn and this issue is typical
 correlated with a huge increase in read latencies - could that be the
 reason for this - mslab should be taking care of the memstore related heap
 fragmentation ? Has anyone seen issues with block cache churn ?

 On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma va...@pinterest.com wrote:

  I am curious how reducing the memstore size would help - I have 6 regions
  and 3G total for memstore - so I would like max out on that by having a
  bigger flush size per region. Are you asking me to have more regions and
  smaller memstore flush size instead ? How is that likely to help
 
 
  On Thu, Jan 24, 2013 at 6:07 PM, 谢良 xieli...@xiaomi.com wrote:
 
  Hi Varun,
 
  Please note if you try to increase new generation size, then the
 ParNew
  time will be up accordingly, and CMS YGC is also a STW.
  could you have a try to reduce memstore size to a smaller value, e.g.
  128m or 256m ?
 
  Regards,
  Liang
  
  发件人: Varun Sharma [va...@pinterest.com]
  发送时间: 2013年1月25日 1:40
  收件人: user@hbase.apache.org
  主题: GC pause issues
 
  Hi,
 
  I have a region server which has the following logs. As you can see from
  the log, ParNew is sufficiently big (450M) and there are heavy writes
  going
  in. I am seeing 200ms pauses which eventually build up and there is a
  promotion failure. There is a parnew collection every 2-3 seconds so it
  fills up real fast. My memstore size is bigger 512m for flushes and 4
  regions per server. (overall size is 3G for all memstores) - I have
 mslab
  enabled
 
  2013-01-24T13:08:16.870+: 63533.964: [GC 63533.964: [ParNew:
  471841K-52416K(471872K), 0.2251880 secs] 8733008K-8445039K(12727104K),
  0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
  2013-01-24T13:08:19.546+: 63536.639: [GC 63536.639: [ParNew:
  461593K-52416K(471872K), 0.2812690 secs] 8854216K-8557572K(12727104K),
  0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
  013-01-24T13:08:21.824+: 63538.917: [GC 63538.918: [ParNew:
  442836K-52416K(471872K), 0.2781490 secs] 8947992K-8705355K(12727104K),
  0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
  2013-01-24T13:08:22.122+: 63539.216: [GC [1 CMS-initial-mark:
  8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
  user=0.02
  sys=0.00, real=0.04 secs]
  2013-01-24T13:08:22.159+: 63539.253: [CMS-concurrent-mark-start]
  2013-01-24T13:08:24.953+: 63542.047: [GC 63542.047: [ParNew:
  471872K-52251K(471872K), 0.1611970 secs] 9124811K-8831437K(12727104K),
  0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
  2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-mark:
 4.105/4.268
  secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
  2013-01-24T13:08:26.434+: 63543.527: [CMS-concurrent-preclean-start]
  2013-01-24T13:08:26.597+: 63543.691: [CMS-concurrent-preclean:
  0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
  2013-01-24T13:08:26.597+: 63543.691:
  [CMS-concurrent-abortable-preclean-start]
  2013-01-24T13:08:27.401+: 63544.495:
  [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
  sys=0.16, real=0.80 secs]
  2013-01-24T13:08:27.403+: 63544.496: [GC[YG occupancy: 274458 K
  (471872
  K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
  processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
  9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01,
 real=0.06
  secs]
  2013-01-24T13:08:27.458+: 63544.551: [CMS-concurrent-sweep-start]
  2013-01-24T13:08:27.955+: 63545.048: [GC 63545.049: [ParNew:
  471707K-44566K(471872K), 0.1371770 secs] 9044714K-8701862K(12727104K),
  0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
  2013-01-24T13:08:29.285+: 63546.378: [GC 63546.478: [ParNew:
  445714K-52416K(471872K), 0.5626120 secs] 8648805K-8410223K(12727104K),
  0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
  2013-01-24T13:08:32.308+: 63549.401: [GC 63549.402: [ParNew:
  471872K-52416K(471872K), 0.2300560 secs] 8247804K-7976043K(12727104K),
  0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
  2013-01-24T13:08:34.844+: 63551.938: [GC 63551.938: [ParNew
 (promotion
  failed): 471872K-471872K(471872K), 0.2788500 secs]63552.217:
  [CMS2013-01-24T13:08:37.256+: 63554.349: [CMS-concurrent-sweep:
  8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
 
  What might be advised here - should I up the size to 1G for the new
  generation ?
 
  Thanks
  Varun