Re: Hbase restart issues

2013-04-01 Thread Mohammad Tariq
Hello Rishabh,

 Is your NN able to come out of the safemode by itself?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Apr 1, 2013 at 12:19 PM, Rishabh Agrawal 
rishabh.agra...@impetus.co.in wrote:

 Hello

 Whenever, I stop Hbase and Hadoop  gracefully (in that order ) and then
 start Hadoop and Hbase (in that order), Hmaster refuses to start quoting
 zookeeper Config issues. It seems that it is not able to re-connect with
 Hadoop.

 Any help will be really appreciated.

 Thanks and Regards
 Rishabh Agrawal


 






 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.



Re: Hbase restart issues

2013-04-01 Thread Vibhav Mundra
Please delete the files in /tmp folder on all the master/slaves and u
should be good to go.

-Vibhav


On Mon, Apr 1, 2013 at 3:14 PM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Rishabh,

  Is your NN able to come out of the safemode by itself?

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Mon, Apr 1, 2013 at 12:19 PM, Rishabh Agrawal 
 rishabh.agra...@impetus.co.in wrote:

  Hello
 
  Whenever, I stop Hbase and Hadoop  gracefully (in that order ) and then
  start Hadoop and Hbase (in that order), Hmaster refuses to start quoting
  zookeeper Config issues. It seems that it is not able to re-connect with
  Hadoop.
 
  Any help will be really appreciated.
 
  Thanks and Regards
  Rishabh Agrawal
 
 
  
 
 
 
 
 
 
  NOTE: This message may contain information that is confidential,
  proprietary, privileged or otherwise protected by law. The message is
  intended solely for the named addressee. If received in error, please
  destroy and notify the sender. Any use of this email is prohibited when
  received in error. Impetus does not represent, warrant and/or guarantee,
  that the integrity of this communication has been maintained nor that the
  communication is free of errors, virus, interception or interference.
 



RE: Hbase restart issues

2013-04-01 Thread Rishabh Agrawal
Thanks everyone. It is working now. I deleted temp files and it started 
working. But I am not able to understand such behavior, any thoughts on that.

-Original Message-
From: Vibhav Mundra [mailto:mun...@gmail.com]
Sent: Monday, April 01, 2013 3:40 PM
To: user@hbase.apache.org
Subject: Re: Hbase restart issues

Please delete the files in /tmp folder on all the master/slaves and u should be 
good to go.

-Vibhav


On Mon, Apr 1, 2013 at 3:14 PM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Rishabh,

  Is your NN able to come out of the safemode by itself?

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Mon, Apr 1, 2013 at 12:19 PM, Rishabh Agrawal 
 rishabh.agra...@impetus.co.in wrote:

  Hello
 
  Whenever, I stop Hbase and Hadoop  gracefully (in that order ) and
  then start Hadoop and Hbase (in that order), Hmaster refuses to
  start quoting zookeeper Config issues. It seems that it is not able
  to re-connect with Hadoop.
 
  Any help will be really appreciated.
 
  Thanks and Regards
  Rishabh Agrawal
 
 
  
 
 
 
 
 
 
  NOTE: This message may contain information that is confidential,
  proprietary, privileged or otherwise protected by law. The message
  is intended solely for the named addressee. If received in error,
  please destroy and notify the sender. Any use of this email is
  prohibited when received in error. Impetus does not represent,
  warrant and/or guarantee, that the integrity of this communication
  has been maintained nor that the communication is free of errors, virus, 
  interception or interference.
 









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Read thruput

2013-04-01 Thread ramkrishna vasudevan
Hi

How big is your row?  Are they wider rows and what would be the size of
every cell?
How many read threads are getting used?


Were you able to take a thread dump when this was happening?  Have you seen
the GC log?
May be need some more info before we can think of the problem.

Regards
Ram


On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com wrote:

 Hi All,

 I am trying to use Hbase for real-time data retrieval with a timeout of 50
 ms.

 I am using 2 machines as datanode and regionservers,
 and one machine as a master for hadoop and Hbase.

 But I am able to fire only 3000 queries per sec and 10% of them are timing
 out.
 The database has 60 million rows.

 Are these figure okie, or I am missing something.
 I have used the scanner caching to be equal to one, because for each time
 we are fetching a single row only.

 Here are the various configurations:

 *Our schema
 *{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', COMPRESSION =
 'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
 P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK = 'true',
 IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}

 *Configuration*
 1 Machine having both hbase and hadoop master
 2 machines having both region server node and datanode
 total 285 region servers

 *Machine Level Optimizations:*
 a)No of file descriptors is 100(ulimit -n gives 100)
 b)Increase the read-ahead value to 4096
 c)Added noatime,nodiratime to the disks

 *Hadoop Optimizations:*
 dfs.datanode.max.xcievers = 4096
 dfs.block.size = 33554432
 dfs.datanode.handler.count = 256
 io.file.buffer.size = 65536
 hadoop data is split on 4 directories, so that different disks are being
 accessed

 *Hbase Optimizations*:

 hbase.client.scanner.caching=1  #We have specifcally added this, as we
 return always one row.
 hbase.regionserver.handler.count=3200
 hfile.block.cache.size=0.35
 hbase.hregion.memstore.mslab.enabled=true
 hfile.min.blocksize.size=16384
 hfile.min.blocksize.size=4
 hbase.hstore.blockingStoreFiles=200
 hbase.regionserver.optionallogflushinterval=6
 hbase.hregion.majorcompaction=0
 hbase.hstore.compaction.max=100
 hbase.hstore.compactionThreshold=100

 *Hbase-GC
 *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
 *Hadoop-GC*
 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

 -Vibhav



Re: Read thruput

2013-04-01 Thread Vibhav Mundra
The typical size of each of my row is less than 1KB.

Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB for
datanodes and I dont see them completely used. So I ruled out the GC aspect.

In case u still believe that GC is an issue, I will upload the gc logs.

-Vibhav


On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 Hi

 How big is your row?  Are they wider rows and what would be the size of
 every cell?
 How many read threads are getting used?


 Were you able to take a thread dump when this was happening?  Have you seen
 the GC log?
 May be need some more info before we can think of the problem.

 Regards
 Ram


 On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com wrote:

  Hi All,
 
  I am trying to use Hbase for real-time data retrieval with a timeout of
 50
  ms.
 
  I am using 2 machines as datanode and regionservers,
  and one machine as a master for hadoop and Hbase.
 
  But I am able to fire only 3000 queries per sec and 10% of them are
 timing
  out.
  The database has 60 million rows.
 
  Are these figure okie, or I am missing something.
  I have used the scanner caching to be equal to one, because for each time
  we are fetching a single row only.
 
  Here are the various configurations:
 
  *Our schema
  *{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
  'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', COMPRESSION =
  'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
  P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK =
 'true',
  IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
 
  *Configuration*
  1 Machine having both hbase and hadoop master
  2 machines having both region server node and datanode
  total 285 region servers
 
  *Machine Level Optimizations:*
  a)No of file descriptors is 100(ulimit -n gives 100)
  b)Increase the read-ahead value to 4096
  c)Added noatime,nodiratime to the disks
 
  *Hadoop Optimizations:*
  dfs.datanode.max.xcievers = 4096
  dfs.block.size = 33554432
  dfs.datanode.handler.count = 256
  io.file.buffer.size = 65536
  hadoop data is split on 4 directories, so that different disks are being
  accessed
 
  *Hbase Optimizations*:
 
  hbase.client.scanner.caching=1  #We have specifcally added this, as we
  return always one row.
  hbase.regionserver.handler.count=3200
  hfile.block.cache.size=0.35
  hbase.hregion.memstore.mslab.enabled=true
  hfile.min.blocksize.size=16384
  hfile.min.blocksize.size=4
  hbase.hstore.blockingStoreFiles=200
  hbase.regionserver.optionallogflushinterval=6
  hbase.hregion.majorcompaction=0
  hbase.hstore.compaction.max=100
  hbase.hstore.compactionThreshold=100
 
  *Hbase-GC
  *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
  -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
  *Hadoop-GC*
  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 
  -Vibhav
 



Re: Read thruput

2013-04-01 Thread Ted
Can you increase block cache size ?

What version of hbase are you using ?

Thanks

On Apr 1, 2013, at 3:47 AM, Vibhav Mundra mun...@gmail.com wrote:

 The typical size of each of my row is less than 1KB.
 
 Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB for
 datanodes and I dont see them completely used. So I ruled out the GC aspect.
 
 In case u still believe that GC is an issue, I will upload the gc logs.
 
 -Vibhav
 
 
 On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan 
 ramkrishna.s.vasude...@gmail.com wrote:
 
 Hi
 
 How big is your row?  Are they wider rows and what would be the size of
 every cell?
 How many read threads are getting used?
 
 
 Were you able to take a thread dump when this was happening?  Have you seen
 the GC log?
 May be need some more info before we can think of the problem.
 
 Regards
 Ram
 
 
 On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com wrote:
 
 Hi All,
 
 I am trying to use Hbase for real-time data retrieval with a timeout of
 50
 ms.
 
 I am using 2 machines as datanode and regionservers,
 and one machine as a master for hadoop and Hbase.
 
 But I am able to fire only 3000 queries per sec and 10% of them are
 timing
 out.
 The database has 60 million rows.
 
 Are these figure okie, or I am missing something.
 I have used the scanner caching to be equal to one, because for each time
 we are fetching a single row only.
 
 Here are the various configurations:
 
 *Our schema
 *{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', COMPRESSION =
 'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
 P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK =
 'true',
 IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
 
 *Configuration*
 1 Machine having both hbase and hadoop master
 2 machines having both region server node and datanode
 total 285 region servers
 
 *Machine Level Optimizations:*
 a)No of file descriptors is 100(ulimit -n gives 100)
 b)Increase the read-ahead value to 4096
 c)Added noatime,nodiratime to the disks
 
 *Hadoop Optimizations:*
 dfs.datanode.max.xcievers = 4096
 dfs.block.size = 33554432
 dfs.datanode.handler.count = 256
 io.file.buffer.size = 65536
 hadoop data is split on 4 directories, so that different disks are being
 accessed
 
 *Hbase Optimizations*:
 
 hbase.client.scanner.caching=1  #We have specifcally added this, as we
 return always one row.
 hbase.regionserver.handler.count=3200
 hfile.block.cache.size=0.35
 hbase.hregion.memstore.mslab.enabled=true
 hfile.min.blocksize.size=16384
 hfile.min.blocksize.size=4
 hbase.hstore.blockingStoreFiles=200
 hbase.regionserver.optionallogflushinterval=6
 hbase.hregion.majorcompaction=0
 hbase.hstore.compaction.max=100
 hbase.hstore.compactionThreshold=100
 
 *Hbase-GC
 *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
 *Hadoop-GC*
 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 
 -Vibhav
 


Re: What is the output format of org.apache.hadoop.examples.Join?

2013-04-01 Thread Yanbo Liang
You can give the detail information about your running parameters, hadoop
version, etc.
From the principle and source code, you output is not reasonable.
The reduce stage of MR will merge the value to TupleWritable.


2013/3/28 jingguo yao yaojing...@gmail.com

 Yanbo:

 Sorry for pasting the wrong result.

 The output for joining a.txt, b.txt and c.txt is as follows (still not
 the same produced by Chris):

 a0  [,,]
 b0  [,,]
 c0  [,,]
 a1  [,,]
 b1  [,,]
 b2  [,,]
 b3  [,,]
 c1  [,,]
 a2  [,,]
 a3  [,,]
 c2  [,,]
 c3  [,,]


 On Thu, Mar 28, 2013 at 11:46 AM, Yanbo Liang yanboha...@gmail.com
 wrote:
  Your output is only a.txt join b.txt.
  You need to joint c.txt continually.
 
  2013/3/26 jingguo yao yaojing...@gmail.com
 
  I am reading the following mail:
 
  http://www.mail-archive.com/core-user@hadoop.apache.org/msg04066.html
 
  After running the following command (I am using Hadoop 1.0.4):
 
  bin/hadoop jar hadoop-examples-1.0.4.jar join \
 -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
 -outKey org.apache.hadoop.io.Text \
 -joinOp outer \
 join/a.txt join/b.txt join/c.txt joinout
 
 
  Then I run bin/hadoop fs -text joinout/part-0. I see the following
  result:
 
  a0  [,]
  b0  [,]
  a1  [,]
  b1  [,]
  b2  [,]
  b3  [,]
  a2  [,]
  a3  [,]
 
  But Chris said that the result should be:
 
  [a0,b0,c0]
  [a1,b1,c1]
  [a1,b2,c1]
  [a1,b3,c1]
  [a2,,]
  [a3,,]
  [,,c2]
  [,,c3]
 
  Is Join's output format changed for Hadoop 1.0.4?
 
 
  --
  Jingguo
 



 --
 Jingguo



Re: Read thruput

2013-04-01 Thread Azuryy Yu
can you output GC log? CMS GC should be optimized futher. please find it on
official site. another, use vmstat monitor page rate during query.
On Apr 1, 2013 6:09 PM, Vibhav Mundra mun...@gmail.com wrote:

 Hi All,

 I am trying to use Hbase for real-time data retrieval with a timeout of 50
 ms.

 I am using 2 machines as datanode and regionservers,
 and one machine as a master for hadoop and Hbase.

 But I am able to fire only 3000 queries per sec and 10% of them are timing
 out.
 The database has 60 million rows.

 Are these figure okie, or I am missing something.
 I have used the scanner caching to be equal to one, because for each time
 we are fetching a single row only.

 Here are the various configurations:

 *Our schema
 *{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', COMPRESSION =
 'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
 P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK = 'true',
 IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}

 *Configuration*
 1 Machine having both hbase and hadoop master
 2 machines having both region server node and datanode
 total 285 region servers

 *Machine Level Optimizations:*
 a)No of file descriptors is 100(ulimit -n gives 100)
 b)Increase the read-ahead value to 4096
 c)Added noatime,nodiratime to the disks

 *Hadoop Optimizations:*
 dfs.datanode.max.xcievers = 4096
 dfs.block.size = 33554432
 dfs.datanode.handler.count = 256
 io.file.buffer.size = 65536
 hadoop data is split on 4 directories, so that different disks are being
 accessed

 *Hbase Optimizations*:

 hbase.client.scanner.caching=1  #We have specifcally added this, as we
 return always one row.
 hbase.regionserver.handler.count=3200
 hfile.block.cache.size=0.35
 hbase.hregion.memstore.mslab.enabled=true
 hfile.min.blocksize.size=16384
 hfile.min.blocksize.size=4
 hbase.hstore.blockingStoreFiles=200
 hbase.regionserver.optionallogflushinterval=6
 hbase.hregion.majorcompaction=0
 hbase.hstore.compaction.max=100
 hbase.hstore.compactionThreshold=100

 *Hbase-GC
 *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
 *Hadoop-GC*
 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

 -Vibhav



Re: Read thruput

2013-04-01 Thread Vibhav Mundra
I have used the following site:
http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

to lessen the value of block cache.

-Vibhav


On Mon, Apr 1, 2013 at 4:23 PM, Ted yuzhih...@gmail.com wrote:

 Can you increase block cache size ?

 What version of hbase are you using ?

 Thanks

 On Apr 1, 2013, at 3:47 AM, Vibhav Mundra mun...@gmail.com wrote:

  The typical size of each of my row is less than 1KB.
 
  Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB
 for
  datanodes and I dont see them completely used. So I ruled out the GC
 aspect.
 
  In case u still believe that GC is an issue, I will upload the gc logs.
 
  -Vibhav
 
 
  On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan 
  ramkrishna.s.vasude...@gmail.com wrote:
 
  Hi
 
  How big is your row?  Are they wider rows and what would be the size of
  every cell?
  How many read threads are getting used?
 
 
  Were you able to take a thread dump when this was happening?  Have you
 seen
  the GC log?
  May be need some more info before we can think of the problem.
 
  Regards
  Ram
 
 
  On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com wrote:
 
  Hi All,
 
  I am trying to use Hbase for real-time data retrieval with a timeout of
  50
  ms.
 
  I am using 2 machines as datanode and regionservers,
  and one machine as a master for hadoop and Hbase.
 
  But I am able to fire only 3000 queries per sec and 10% of them are
  timing
  out.
  The database has 60 million rows.
 
  Are these figure okie, or I am missing something.
  I have used the scanner caching to be equal to one, because for each
 time
  we are fetching a single row only.
 
  Here are the various configurations:
 
  *Our schema
  *{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
  'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', COMPRESSION
 =
  'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
  P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK =
  'true',
  IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
 
  *Configuration*
  1 Machine having both hbase and hadoop master
  2 machines having both region server node and datanode
  total 285 region servers
 
  *Machine Level Optimizations:*
  a)No of file descriptors is 100(ulimit -n gives 100)
  b)Increase the read-ahead value to 4096
  c)Added noatime,nodiratime to the disks
 
  *Hadoop Optimizations:*
  dfs.datanode.max.xcievers = 4096
  dfs.block.size = 33554432
  dfs.datanode.handler.count = 256
  io.file.buffer.size = 65536
  hadoop data is split on 4 directories, so that different disks are
 being
  accessed
 
  *Hbase Optimizations*:
 
  hbase.client.scanner.caching=1  #We have specifcally added this, as we
  return always one row.
  hbase.regionserver.handler.count=3200
  hfile.block.cache.size=0.35
  hbase.hregion.memstore.mslab.enabled=true
  hfile.min.blocksize.size=16384
  hfile.min.blocksize.size=4
  hbase.hstore.blockingStoreFiles=200
  hbase.regionserver.optionallogflushinterval=6
  hbase.hregion.majorcompaction=0
  hbase.hstore.compaction.max=100
  hbase.hstore.compactionThreshold=100
 
  *Hbase-GC
  *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
  -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
  *Hadoop-GC*
  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 
  -Vibhav
 



Inconsistencies in comparisons using KeyComparator

2013-04-01 Thread Alan Chaney

Hi

I need to write some code that sorts row keys identically to HBase.

I looked at the KeyValue.KeyComparator code, and it seems that, by 
default, HBase elects to use the 'Unsafe' comparator as the basis of its 
comparison, with a fall-back to to the PureJavaComparer should Unsafe 
not be available (for example, in tests.)


However, I'm finding that the sort order from a call to 
KeyValue.KeyComparator appears to be inconsistent between the two forms.


As an example, comparing:

(first param) (second param)
616c1b to 
61741b


gives 1 for the default (presumably, Unsafe) call, and -1 using the 
PureJavaComparator.


I would actually expect it to be a -ve number, based on the difference 
of 6c to 74 in the 3rd from last byte above.


Similarly

616c1b to 
00061741b


gives  0 instead of  0. The PureJavaComparator does a byte-by-byte 
comparison by


Is this expected? From the definition of lexicographical compare that I 
found, I don't think so. There's no issue of signed comparison here, 
because 0x6c and 0x74 are still +ve byte values.


Regards

Alan




Re: Inconsistencies in comparisons using KeyComparator

2013-04-01 Thread Stack
That is an interesting (disturbing) find Alan.  Hopefully the fallback is
rare.  Did you have a technique for making the compare fallback to pure
java compare?

Thank you,
St.Ack


On Mon, Apr 1, 2013 at 7:54 AM, Alan Chaney a...@mechnicality.com wrote:

 Hi

 I need to write some code that sorts row keys identically to HBase.

 I looked at the KeyValue.KeyComparator code, and it seems that, by
 default, HBase elects to use the 'Unsafe' comparator as the basis of its
 comparison, with a fall-back to to the PureJavaComparer should Unsafe not
 be available (for example, in tests.)

 However, I'm finding that the sort order from a call to
 KeyValue.KeyComparator appears to be inconsistent between the two forms.

 As an example, comparing:

 (first param) (second param)
 ff**ff616c1b to
 ff**ff61741b

 gives 1 for the default (presumably, Unsafe) call, and -1 using the
 PureJavaComparator.

 I would actually expect it to be a -ve number, based on the difference of
 6c to 74 in the 3rd from last byte above.

 Similarly

 00**00616c1b to
 00**061741b

 gives  0 instead of  0. The PureJavaComparator does a byte-by-byte
 comparison by

 Is this expected? From the definition of lexicographical compare that I
 found, I don't think so. There's no issue of signed comparison here,
 because 0x6c and 0x74 are still +ve byte values.

 Regards

 Alan





Re: Read thruput

2013-04-01 Thread Ted Yu
I was aware of that discussion which was about MAX_FILESIZE and BLOCKSIZE

My suggestion was about block cache percentage.

Cheers


On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra mun...@gmail.com wrote:

 I have used the following site:
 http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

 to lessen the value of block cache.

 -Vibhav


 On Mon, Apr 1, 2013 at 4:23 PM, Ted yuzhih...@gmail.com wrote:

  Can you increase block cache size ?
 
  What version of hbase are you using ?
 
  Thanks
 
  On Apr 1, 2013, at 3:47 AM, Vibhav Mundra mun...@gmail.com wrote:
 
   The typical size of each of my row is less than 1KB.
  
   Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB
  for
   datanodes and I dont see them completely used. So I ruled out the GC
  aspect.
  
   In case u still believe that GC is an issue, I will upload the gc logs.
  
   -Vibhav
  
  
   On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan 
   ramkrishna.s.vasude...@gmail.com wrote:
  
   Hi
  
   How big is your row?  Are they wider rows and what would be the size
 of
   every cell?
   How many read threads are getting used?
  
  
   Were you able to take a thread dump when this was happening?  Have you
  seen
   the GC log?
   May be need some more info before we can think of the problem.
  
   Regards
   Ram
  
  
   On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com
 wrote:
  
   Hi All,
  
   I am trying to use Hbase for real-time data retrieval with a timeout
 of
   50
   ms.
  
   I am using 2 machines as datanode and regionservers,
   and one machine as a master for hadoop and Hbase.
  
   But I am able to fire only 3000 queries per sec and 10% of them are
   timing
   out.
   The database has 60 million rows.
  
   Are these figure okie, or I am missing something.
   I have used the scanner caching to be equal to one, because for each
  time
   we are fetching a single row only.
  
   Here are the various configurations:
  
   *Our schema
   *{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING
 =
   'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0',
 COMPRESSION
  =
   'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
   P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK =
   'true',
   IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
  
   *Configuration*
   1 Machine having both hbase and hadoop master
   2 machines having both region server node and datanode
   total 285 region servers
  
   *Machine Level Optimizations:*
   a)No of file descriptors is 100(ulimit -n gives 100)
   b)Increase the read-ahead value to 4096
   c)Added noatime,nodiratime to the disks
  
   *Hadoop Optimizations:*
   dfs.datanode.max.xcievers = 4096
   dfs.block.size = 33554432
   dfs.datanode.handler.count = 256
   io.file.buffer.size = 65536
   hadoop data is split on 4 directories, so that different disks are
  being
   accessed
  
   *Hbase Optimizations*:
  
   hbase.client.scanner.caching=1  #We have specifcally added this, as
 we
   return always one row.
   hbase.regionserver.handler.count=3200
   hfile.block.cache.size=0.35
   hbase.hregion.memstore.mslab.enabled=true
   hfile.min.blocksize.size=16384
   hfile.min.blocksize.size=4
   hbase.hstore.blockingStoreFiles=200
   hbase.regionserver.optionallogflushinterval=6
   hbase.hregion.majorcompaction=0
   hbase.hstore.compaction.max=100
   hbase.hstore.compactionThreshold=100
  
   *Hbase-GC
   *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 -XX:+CMSParallelRemarkEnabled
   -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
   *Hadoop-GC*
   -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
  
   -Vibhav
  
 



Re: Inconsistencies in comparisons using KeyComparator

2013-04-01 Thread Alan Chaney


On 4/1/2013 9:42 AM, Stack wrote:

That is an interesting (disturbing) find Alan.  Hopefully the fallback is
rare.  Did you have a technique for making the compare fallback to pure
java compare?

Thank you,
St.Ack


I agree its disturbing! I based my findings on reading the source code 
for 0.92.1  (the CDH4.1.2 distro).


It seems to me that, from org.apache.hadoop.hbase.KeyValue$KVComparator 
the KeyComparator calls KeyComparator.compareRows which in turn calls


Bytes.compareTo(left, loffset, llength, righ, roffset, rlength) which in 
turn calls Bytes.compareTo which calls 
LexicographicalCompareHolder.BEST_COMPARER


which appears to be implemented thus:

  static class LexicographicalComparerHolder {
static final String UNSAFE_COMPARER_NAME =
LexicographicalComparerHolder.class.getName() + $UnsafeComparer;

static final Comparerbyte[] BEST_COMPARER = getBestComparer();
/**
 * Returns the Unsafe-using Comparer, or falls back to the pure-Java
 * implementation if unable to do so.
 */
static Comparerbyte[] getBestComparer() {
  try {
Class? theClass = Class.forName(UNSAFE_COMPARER_NAME);
...
}

enum PureJavaComparer implements Comparerbyte[] {
  INSTANCE;

  @Override
  public int compareTo(byte[] buffer1, int offset1, int length1,
   ...
  }
}

So, it looks like to me that Unsafe is the default. However, its not 
really very easy to debug this, except by invoking the 
KeyValue.KeyComparator and seeing what you get, which is what I did. 
Either I'm doing something very stupid (extremely plausible) or there is 
a bit of an issue here. I was hoping that someone would point out my error!


I've got some unit tests that appear to show the difference.

Thanks

Alan





On Mon, Apr 1, 2013 at 7:54 AM, Alan Chaney a...@mechnicality.com wrote:


Hi

I need to write some code that sorts row keys identically to HBase.

I looked at the KeyValue.KeyComparator code, and it seems that, by
default, HBase elects to use the 'Unsafe' comparator as the basis of its
comparison, with a fall-back to to the PureJavaComparer should Unsafe not
be available (for example, in tests.)

However, I'm finding that the sort order from a call to
KeyValue.KeyComparator appears to be inconsistent between the two forms.

As an example, comparing:

(first param) (second param)
ff**ff616c1b to
ff**ff61741b

gives 1 for the default (presumably, Unsafe) call, and -1 using the
PureJavaComparator.

I would actually expect it to be a -ve number, based on the difference of
6c to 74 in the 3rd from last byte above.

Similarly

00**00616c1b to
00**061741b

gives  0 instead of  0. The PureJavaComparator does a byte-by-byte
comparison by

Is this expected? From the definition of lexicographical compare that I
found, I don't think so. There's no issue of signed comparison here,
because 0x6c and 0x74 are still +ve byte values.

Regards

Alan







Re: Read thruput

2013-04-01 Thread Vibhav Mundra
yes, I have changes the BLOCK CACHE % to 0.35.

-Vibhav


On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu yuzhih...@gmail.com wrote:

 I was aware of that discussion which was about MAX_FILESIZE and BLOCKSIZE

 My suggestion was about block cache percentage.

 Cheers


 On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra mun...@gmail.com wrote:

  I have used the following site:
  http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
 
  to lessen the value of block cache.
 
  -Vibhav
 
 
  On Mon, Apr 1, 2013 at 4:23 PM, Ted yuzhih...@gmail.com wrote:
 
   Can you increase block cache size ?
  
   What version of hbase are you using ?
  
   Thanks
  
   On Apr 1, 2013, at 3:47 AM, Vibhav Mundra mun...@gmail.com wrote:
  
The typical size of each of my row is less than 1KB.
   
Regarding the memory, I have used 8GB for Hbase regionservers and 4
 GB
   for
datanodes and I dont see them completely used. So I ruled out the GC
   aspect.
   
In case u still believe that GC is an issue, I will upload the gc
 logs.
   
-Vibhav
   
   
On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:
   
Hi
   
How big is your row?  Are they wider rows and what would be the size
  of
every cell?
How many read threads are getting used?
   
   
Were you able to take a thread dump when this was happening?  Have
 you
   seen
the GC log?
May be need some more info before we can think of the problem.
   
Regards
Ram
   
   
On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com
  wrote:
   
Hi All,
   
I am trying to use Hbase for real-time data retrieval with a
 timeout
  of
50
ms.
   
I am using 2 machines as datanode and regionservers,
and one machine as a master for hadoop and Hbase.
   
But I am able to fire only 3000 queries per sec and 10% of them are
timing
out.
The database has 60 million rows.
   
Are these figure okie, or I am missing something.
I have used the scanner caching to be equal to one, because for
 each
   time
we are fetching a single row only.
   
Here are the various configurations:
   
*Our schema
*{NAME = 'mytable', FAMILIES = [{NAME = 'cf',
 DATA_BLOCK_ENCODING
  =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0',
  COMPRESSION
   =
'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0',
 KEE
P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK =
'true',
IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
   
*Configuration*
1 Machine having both hbase and hadoop master
2 machines having both region server node and datanode
total 285 region servers
   
*Machine Level Optimizations:*
a)No of file descriptors is 100(ulimit -n gives 100)
b)Increase the read-ahead value to 4096
c)Added noatime,nodiratime to the disks
   
*Hadoop Optimizations:*
dfs.datanode.max.xcievers = 4096
dfs.block.size = 33554432
dfs.datanode.handler.count = 256
io.file.buffer.size = 65536
hadoop data is split on 4 directories, so that different disks are
   being
accessed
   
*Hbase Optimizations*:
   
hbase.client.scanner.caching=1  #We have specifcally added this, as
  we
return always one row.
hbase.regionserver.handler.count=3200
hfile.block.cache.size=0.35
hbase.hregion.memstore.mslab.enabled=true
hfile.min.blocksize.size=16384
hfile.min.blocksize.size=4
hbase.hstore.blockingStoreFiles=200
hbase.regionserver.optionallogflushinterval=6
hbase.hregion.majorcompaction=0
hbase.hstore.compaction.max=100
hbase.hstore.compactionThreshold=100
   
*Hbase-GC
*-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
*Hadoop-GC*
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
   
-Vibhav
   
  
 



Re: Read thruput

2013-04-01 Thread Vibhav Mundra
What is the general read-thru put that one gets when using Hbase.

 I am not to able to achieve more than 3000/secs with a timeout of 50
millisecs.
In this case also there is 10% of them are timing-out.

-Vibhav


On Mon, Apr 1, 2013 at 11:20 PM, Vibhav Mundra mun...@gmail.com wrote:

 yes, I have changes the BLOCK CACHE % to 0.35.

 -Vibhav


 On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu yuzhih...@gmail.com wrote:

 I was aware of that discussion which was about MAX_FILESIZE and BLOCKSIZE

 My suggestion was about block cache percentage.

 Cheers


 On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra mun...@gmail.com wrote:

  I have used the following site:
  http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
 
  to lessen the value of block cache.
 
  -Vibhav
 
 
  On Mon, Apr 1, 2013 at 4:23 PM, Ted yuzhih...@gmail.com wrote:
 
   Can you increase block cache size ?
  
   What version of hbase are you using ?
  
   Thanks
  
   On Apr 1, 2013, at 3:47 AM, Vibhav Mundra mun...@gmail.com wrote:
  
The typical size of each of my row is less than 1KB.
   
Regarding the memory, I have used 8GB for Hbase regionservers and 4
 GB
   for
datanodes and I dont see them completely used. So I ruled out the GC
   aspect.
   
In case u still believe that GC is an issue, I will upload the gc
 logs.
   
-Vibhav
   
   
On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:
   
Hi
   
How big is your row?  Are they wider rows and what would be the
 size
  of
every cell?
How many read threads are getting used?
   
   
Were you able to take a thread dump when this was happening?  Have
 you
   seen
the GC log?
May be need some more info before we can think of the problem.
   
Regards
Ram
   
   
On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com
  wrote:
   
Hi All,
   
I am trying to use Hbase for real-time data retrieval with a
 timeout
  of
50
ms.
   
I am using 2 machines as datanode and regionservers,
and one machine as a master for hadoop and Hbase.
   
But I am able to fire only 3000 queries per sec and 10% of them
 are
timing
out.
The database has 60 million rows.
   
Are these figure okie, or I am missing something.
I have used the scanner caching to be equal to one, because for
 each
   time
we are fetching a single row only.
   
Here are the various configurations:
   
*Our schema
*{NAME = 'mytable', FAMILIES = [{NAME = 'cf',
 DATA_BLOCK_ENCODING
  =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0',
  COMPRESSION
   =
'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0',
 KEE
P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK =
'true',
IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}
   
*Configuration*
1 Machine having both hbase and hadoop master
2 machines having both region server node and datanode
total 285 region servers
   
*Machine Level Optimizations:*
a)No of file descriptors is 100(ulimit -n gives 100)
b)Increase the read-ahead value to 4096
c)Added noatime,nodiratime to the disks
   
*Hadoop Optimizations:*
dfs.datanode.max.xcievers = 4096
dfs.block.size = 33554432
dfs.datanode.handler.count = 256
io.file.buffer.size = 65536
hadoop data is split on 4 directories, so that different disks are
   being
accessed
   
*Hbase Optimizations*:
   
hbase.client.scanner.caching=1  #We have specifcally added this,
 as
  we
return always one row.
hbase.regionserver.handler.count=3200
hfile.block.cache.size=0.35
hbase.hregion.memstore.mslab.enabled=true
hfile.min.blocksize.size=16384
hfile.min.blocksize.size=4
hbase.hstore.blockingStoreFiles=200
hbase.regionserver.optionallogflushinterval=6
hbase.hregion.majorcompaction=0
hbase.hstore.compaction.max=100
hbase.hstore.compactionThreshold=100
   
*Hbase-GC
*-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
*Hadoop-GC*
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
   
-Vibhav
   
  
 





HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
Heya,

Thinking about data types and serialization. I think null support is an
important characteristic for the serialized representations, especially
when considering the compound type. However, doing so in directly
incompatible with fixed-width representations for numerics. For instance,
if we want to have a fixed-width signed long stored on 8-bytes, where do
you put null? float and double types can cheat a little by folding negative
and positive NaN's into a single representation (this isn't strictly
correct!), leaving a place to represent null. In the long example case, the
obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
will allocate an additional encoding which can be used for null. My
experience working with scientific data, however, makes me wince at the
idea.

The variable-width encodings have it a little easier. There's already
enough going on that it's simpler to make room.

Remember, the final goal is to support order-preserving serialization. This
imposes some limitations on our encoding strategies. For instance, it's not
enough to simply encode null, it really needs to be encoded as 0x00 so as
to sort lexicographically earlier than any other value.

What do you think? Any ideas, experiences, etc?

Thanks,
Nick


Re: Read thruput

2013-04-01 Thread Ted Yu
Your hbase.regionserver.handler.count seems very high. The following is
from hbase-default.xml:

For an estimate of server-side memory-used, evaluate
hbase.client.write.buffer * hbase.regionserver.handler.count

In your case, the above product would be 6GB :-)


On Mon, Apr 1, 2013 at 3:09 AM, Vibhav Mundra mun...@gmail.com wrote:

 Hi All,

 I am trying to use Hbase for real-time data retrieval with a timeout of 50
 ms.

 I am using 2 machines as datanode and regionservers,
 and one machine as a master for hadoop and Hbase.

 But I am able to fire only 3000 queries per sec and 10% of them are timing
 out.
 The database has 60 million rows.

 Are these figure okie, or I am missing something.
 I have used the scanner caching to be equal to one, because for each time
 we are fetching a single row only.

 Here are the various configurations:

 *Our schema
 *{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', COMPRESSION =
 'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
 P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK = 'true',
 IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}

 *Configuration*
 1 Machine having both hbase and hadoop master
 2 machines having both region server node and datanode
 total 285 region servers

 *Machine Level Optimizations:*
 a)No of file descriptors is 100(ulimit -n gives 100)
 b)Increase the read-ahead value to 4096
 c)Added noatime,nodiratime to the disks

 *Hadoop Optimizations:*
 dfs.datanode.max.xcievers = 4096
 dfs.block.size = 33554432
 dfs.datanode.handler.count = 256
 io.file.buffer.size = 65536
 hadoop data is split on 4 directories, so that different disks are being
 accessed

 *Hbase Optimizations*:

 hbase.client.scanner.caching=1  #We have specifcally added this, as we
 return always one row.
 hbase.regionserver.handler.count=3200
 hfile.block.cache.size=0.35
 hbase.hregion.memstore.mslab.enabled=true
 hfile.min.blocksize.size=16384
 hfile.min.blocksize.size=4
 hbase.hstore.blockingStoreFiles=200
 hbase.regionserver.optionallogflushinterval=6
 hbase.hregion.majorcompaction=0
 hbase.hstore.compaction.max=100
 hbase.hstore.compactionThreshold=100

 *Hbase-GC
 *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
 *Hadoop-GC*
 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

 -Vibhav



Re: Inconsistencies in comparisons using KeyComparator

2013-04-01 Thread Ted Yu
Looking at
http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/sun/misc/Unsafe.java,
looks like Unsafe is provided by openjdk as well.

I guess this issue, though disturbing, wouldn't show up.


On Mon, Apr 1, 2013 at 10:04 AM, Alan Chaney a...@mechnicality.com wrote:


 On 4/1/2013 9:42 AM, Stack wrote:

 That is an interesting (disturbing) find Alan.  Hopefully the fallback is
 rare.  Did you have a technique for making the compare fallback to pure
 java compare?

 Thank you,
 St.Ack


 I agree its disturbing! I based my findings on reading the source code for
 0.92.1  (the CDH4.1.2 distro).

 It seems to me that, from org.apache.hadoop.hbase.**KeyValue$KVComparator
 the KeyComparator calls KeyComparator.compareRows which in turn calls

 Bytes.compareTo(left, loffset, llength, righ, roffset, rlength) which in
 turn calls Bytes.compareTo which calls LexicographicalCompareHolder.**
 BEST_COMPARER

 which appears to be implemented thus:

   static class LexicographicalComparerHolder {
 static final String UNSAFE_COMPARER_NAME =
 LexicographicalComparerHolder.**class.getName() +
 $UnsafeComparer;

 static final Comparerbyte[] BEST_COMPARER = getBestComparer();
 /**
  * Returns the Unsafe-using Comparer, or falls back to the pure-Java
  * implementation if unable to do so.
  */
 static Comparerbyte[] getBestComparer() {
   try {
 Class? theClass = Class.forName(UNSAFE_COMPARER_**NAME);
 ...
 }

 enum PureJavaComparer implements Comparerbyte[] {
   INSTANCE;

   @Override
   public int compareTo(byte[] buffer1, int offset1, int length1,
...
   }
 }

 So, it looks like to me that Unsafe is the default. However, its not
 really very easy to debug this, except by invoking the
 KeyValue.KeyComparator and seeing what you get, which is what I did. Either
 I'm doing something very stupid (extremely plausible) or there is a bit of
 an issue here. I was hoping that someone would point out my error!

 I've got some unit tests that appear to show the difference.

 Thanks

 Alan





 On Mon, Apr 1, 2013 at 7:54 AM, Alan Chaney a...@mechnicality.com
 wrote:

  Hi

 I need to write some code that sorts row keys identically to HBase.

 I looked at the KeyValue.KeyComparator code, and it seems that, by
 default, HBase elects to use the 'Unsafe' comparator as the basis of its
 comparison, with a fall-back to to the PureJavaComparer should Unsafe not
 be available (for example, in tests.)

 However, I'm finding that the sort order from a call to
 KeyValue.KeyComparator appears to be inconsistent between the two forms.

 As an example, comparing:

 (first param) (second param)
 ffff616c1b to
 ffff61741b

 gives 1 for the default (presumably, Unsafe) call, and -1 using the
 PureJavaComparator.

 I would actually expect it to be a -ve number, based on the difference of
 6c to 74 in the 3rd from last byte above.

 Similarly

 0000616c1b to
 00061741b

 gives  0 instead of  0. The PureJavaComparator does a byte-by-byte
 comparison by

 Is this expected? From the definition of lexicographical compare that I
 found, I don't think so. There's no issue of signed comparison here,
 because 0x6c and 0x74 are still +ve byte values.

 Regards

 Alan







Re: HBase Types: Explicit Null Support

2013-04-01 Thread Doug Meil

HmmmŠ good question.

I think that fixed width support is important for a great many rowkey
constructs cases, so I'd rather see something like losing MIN_VALUE and
keeping fixed width.




On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:

Heya,

Thinking about data types and serialization. I think null support is an
important characteristic for the serialized representations, especially
when considering the compound type. However, doing so in directly
incompatible with fixed-width representations for numerics. For instance,
if we want to have a fixed-width signed long stored on 8-bytes, where do
you put null? float and double types can cheat a little by folding
negative
and positive NaN's into a single representation (this isn't strictly
correct!), leaving a place to represent null. In the long example case,
the
obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
will allocate an additional encoding which can be used for null. My
experience working with scientific data, however, makes me wince at the
idea.

The variable-width encodings have it a little easier. There's already
enough going on that it's simpler to make room.

Remember, the final goal is to support order-preserving serialization.
This
imposes some limitations on our encoding strategies. For instance, it's
not
enough to simply encode null, it really needs to be encoded as 0x00 so as
to sort lexicographically earlier than any other value.

What do you think? Any ideas, experiences, etc?

Thanks,
Nick





Re: HBase Types: Explicit Null Support

2013-04-01 Thread Matt Corgan
I spent some time this weekend extracting bits of our serialization code to
a public github repo at http://github.com/hotpads/data-tools.
 Contributions are welcome - i'm sure we all have this stuff laying around.

You can see I've bumped into the NULL problem in a few places:
*
https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
*
https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java

Looking back, I think my latest opinion on the topic is to reject
nullability as the rule since it can cause unexpected behavior and
confusion.  It's cleaner to provide a wrapper class (so both LongArrayList
plus NullableLongArrayList) that explicitly defines the behavior, and costs
a little more in performance.  If the user can't find a pre-made wrapper
class, it's not very difficult for each user to provide their own
interpretation of null and check for it themselves.

If you reject nullability, the question becomes what to do in situations
where you're implementing existing interfaces that accept nullable params.
 The LongArrayList above implements ListLong which requires an add(Long)
method.  In the above implementation I chose to swap nulls with
Long.MIN_VALUE, however I'm now thinking it best to force the user to make
that swap and then throw IllegalArgumentException if they pass null.


On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil doug.m...@explorysmedical.comwrote:


 HmmmŠ good question.

 I think that fixed width support is important for a great many rowkey
 constructs cases, so I'd rather see something like losing MIN_VALUE and
 keeping fixed width.




 On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 Heya,
 
 Thinking about data types and serialization. I think null support is an
 important characteristic for the serialized representations, especially
 when considering the compound type. However, doing so in directly
 incompatible with fixed-width representations for numerics. For instance,
 if we want to have a fixed-width signed long stored on 8-bytes, where do
 you put null? float and double types can cheat a little by folding
 negative
 and positive NaN's into a single representation (this isn't strictly
 correct!), leaving a place to represent null. In the long example case,
 the
 obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
 will allocate an additional encoding which can be used for null. My
 experience working with scientific data, however, makes me wince at the
 idea.
 
 The variable-width encodings have it a little easier. There's already
 enough going on that it's simpler to make room.
 
 Remember, the final goal is to support order-preserving serialization.
 This
 imposes some limitations on our encoding strategies. For instance, it's
 not
 enough to simply encode null, it really needs to be encoded as 0x00 so as
 to sort lexicographically earlier than any other value.
 
 What do you think? Any ideas, experiences, etc?
 
 Thanks,
 Nick






Re: Read thruput

2013-04-01 Thread Asaf Mesika
How does your client call looks like? Get? Scan? Filters?
Is 3000/sec is client side calls or is it in numbers of rows per sec?
If you measure in MB/sec how much read throughput do you get?
Where is your client located? Same router as the cluster?
Have you activated dfs read short circuit? Of not try it.
Compression - try switching to Snappy - should be faster.
What else is running on the cluster parallel to your reading client?

On Monday, April 1, 2013, Vibhav Mundra wrote:

 What is the general read-thru put that one gets when using Hbase.

  I am not to able to achieve more than 3000/secs with a timeout of 50
 millisecs.
 In this case also there is 10% of them are timing-out.

 -Vibhav


 On Mon, Apr 1, 2013 at 11:20 PM, Vibhav Mundra mun...@gmail.com wrote:

  yes, I have changes the BLOCK CACHE % to 0.35.
 
  -Vibhav
 
 
  On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  I was aware of that discussion which was about MAX_FILESIZE and
 BLOCKSIZE
 
  My suggestion was about block cache percentage.
 
  Cheers
 
 
  On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra mun...@gmail.com wrote:
 
   I have used the following site:
   http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
  
   to lessen the value of block cache.
  
   -Vibhav
  
  
   On Mon, Apr 1, 2013 at 4:23 PM, Ted yuzhih...@gmail.com wrote:
  
Can you increase block cache size ?
   
What version of hbase are you using ?
   
Thanks
   
On Apr 1, 2013, at 3:47 AM, Vibhav Mundra mun...@gmail.com wrote:
   
 The typical size of each of my row is less than 1KB.

 Regarding the memory, I have used 8GB for Hbase regionservers and
 4
  GB
for
 datanodes and I dont see them completely used. So I ruled out the
 GC
aspect.

 In case u still believe that GC is an issue, I will upload the gc
  logs.

 -Vibhav


 On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan 
 ramkrishna.s.vasude...@gmail.com wrote:

 Hi

 How big is your row?  Are they wider rows and what would be the
  size
   of
 every cell?
 How many read threads are getting used?


 Were you able to take a thread dump when this was happening?
  Have
  you
seen
 the GC log?
 May be need some more info before we can think of the problem.

 Regards
 Ram


 On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra mun...@gmail.com
   wrote:

 Hi All,

 I am trying to use Hbase for real-time data retrieval with a
  timeout
   of
 50
 ms.

 I am using 2 machines as datanode and regionservers,
 and one machine as a master for hadoop and Hbase.

 But I am able to fire only 3000 queries per sec and 10% of them
  are
 timing
 out.
 The database has 60 million rows.

 Are these figure okie, or I am missing something.
 I have used the scanner caching to be equal to one, because for
  each
time
   


Re: HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
Thanks for the thoughtful response (and code!).

I'm thinking I will press forward with a base implementation that does not
support nulls. The idea is to provide an extensible set of interfaces, so I
think this will not box us into a corner later. That is, a mirroring
package could be implemented that supports null values and accepts
the relevant trade-offs.

Thanks,
Nick

On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan mcor...@hotpads.com wrote:

 I spent some time this weekend extracting bits of our serialization code to
 a public github repo at http://github.com/hotpads/data-tools.
  Contributions are welcome - i'm sure we all have this stuff laying around.

 You can see I've bumped into the NULL problem in a few places:
 *

 https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
 *

 https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java

 Looking back, I think my latest opinion on the topic is to reject
 nullability as the rule since it can cause unexpected behavior and
 confusion.  It's cleaner to provide a wrapper class (so both LongArrayList
 plus NullableLongArrayList) that explicitly defines the behavior, and costs
 a little more in performance.  If the user can't find a pre-made wrapper
 class, it's not very difficult for each user to provide their own
 interpretation of null and check for it themselves.

 If you reject nullability, the question becomes what to do in situations
 where you're implementing existing interfaces that accept nullable params.
  The LongArrayList above implements ListLong which requires an add(Long)
 method.  In the above implementation I chose to swap nulls with
 Long.MIN_VALUE, however I'm now thinking it best to force the user to make
 that swap and then throw IllegalArgumentException if they pass null.


 On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil doug.m...@explorysmedical.com
 wrote:

 
  HmmmŠ good question.
 
  I think that fixed width support is important for a great many rowkey
  constructs cases, so I'd rather see something like losing MIN_VALUE and
  keeping fixed width.
 
 
 
 
  On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
  Heya,
  
  Thinking about data types and serialization. I think null support is an
  important characteristic for the serialized representations, especially
  when considering the compound type. However, doing so in directly
  incompatible with fixed-width representations for numerics. For
 instance,
  if we want to have a fixed-width signed long stored on 8-bytes, where do
  you put null? float and double types can cheat a little by folding
  negative
  and positive NaN's into a single representation (this isn't strictly
  correct!), leaving a place to represent null. In the long example case,
  the
  obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
  will allocate an additional encoding which can be used for null. My
  experience working with scientific data, however, makes me wince at the
  idea.
  
  The variable-width encodings have it a little easier. There's already
  enough going on that it's simpler to make room.
  
  Remember, the final goal is to support order-preserving serialization.
  This
  imposes some limitations on our encoding strategies. For instance, it's
  not
  enough to simply encode null, it really needs to be encoded as 0x00 so
 as
  to sort lexicographically earlier than any other value.
  
  What do you think? Any ideas, experiences, etc?
  
  Thanks,
  Nick
 
 
 
 



Flume EventSerializer vs hbase coprocessor

2013-04-01 Thread Robert Hamilton
I have a calculation that I'm doing in a custom AsyncHbaseEventSerializer. I 
want to do the calculation in real time, but it looks like it could be done 
either here or in a coprocessor. I'm just doing it in the serializer for now 
because the code is simple that way, and data only ever will come in through 
flume anyway.

But is this good practice?  I would welcome any advice or guidance.

A simplified version of the calculation: 

Every row has a groupID and a data timestamp field; each groupID represents a 
distinct group of rows and the timestamp distinguishes between individual rows 
in the group. We can assume the combination is always unique. So I construct 
the rowkey as concatenated groupID, '.' , and reverse timestamp.

The task I have, for each such row to be inserted into HBase, find the latest 
row already inserted having the same groupID (based on timestamp part of the 
key),  and insert another column having the difference between its time and 
that of the previous record.  

Each row the serializer sees, it looks up the previous row using a scan and 
gets the first row from the scan (thats why I'm using the reverse timestamp).  
Finds the difference and adds that to the list of PutRequests.

Example:  the data having 2 rows looks like this:

,123456, 'hello'
,123400, 'there'

Result in hbase would look like this.

Row: .123456 , 
cf:v = 'hello'
cf:dt = null --- no previous row so dt is null

Row: .123400, 
cf:v='there'
cf:dt=56 -- dt is 56 ms from 123456 - 123400


As shown, I've calculated the dt field from the previous record.  The dt=56 
means this record came from an event that was logged 56 ms later than the first 
one.

Is this a common practice, or am I crazy to be doing this in the serializer? 
Are there performance or reliability issues that I should be considering?




-- 
This e-mail, including attachments, contains confidential and/or 
proprietary information, and may be used only by the person or entity to 
which it is addressed. The reader is hereby notified that any 
dissemination, distribution or copying of this e-mail is prohibited. If you 
have received this e-mail in error, please notify the sender by replying to 
this message and delete this e-mail immediately.


Re: HBase Types: Explicit Null Support

2013-04-01 Thread James Taylor
From the SQL perspective, handling null is important. Phoenix supports 
null in the following way:

- the absence of a key value
- an empty value in a key value
- an empty value in a multi part row key
  - for variable length types (VARCHAR and DECIMAL) a null byte 
separator would be used if not the last column

  - for fixed width types only the last column is allowed to be null

As you mentioned, it's important to maintain the lexicographical sort 
order with nulls being first.


On 04/01/2013 01:32 PM, Nick Dimiduk wrote:

Thanks for the thoughtful response (and code!).

I'm thinking I will press forward with a base implementation that does not
support nulls. The idea is to provide an extensible set of interfaces, so I
think this will not box us into a corner later. That is, a mirroring
package could be implemented that supports null values and accepts
the relevant trade-offs.

Thanks,
Nick

On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan mcor...@hotpads.com wrote:


I spent some time this weekend extracting bits of our serialization code to
a public github repo at http://github.com/hotpads/data-tools.
  Contributions are welcome - i'm sure we all have this stuff laying around.

You can see I've bumped into the NULL problem in a few places:
*

https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
*

https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java

Looking back, I think my latest opinion on the topic is to reject
nullability as the rule since it can cause unexpected behavior and
confusion.  It's cleaner to provide a wrapper class (so both LongArrayList
plus NullableLongArrayList) that explicitly defines the behavior, and costs
a little more in performance.  If the user can't find a pre-made wrapper
class, it's not very difficult for each user to provide their own
interpretation of null and check for it themselves.

If you reject nullability, the question becomes what to do in situations
where you're implementing existing interfaces that accept nullable params.
  The LongArrayList above implements ListLong which requires an add(Long)
method.  In the above implementation I chose to swap nulls with
Long.MIN_VALUE, however I'm now thinking it best to force the user to make
that swap and then throw IllegalArgumentException if they pass null.


On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil doug.m...@explorysmedical.com

wrote:
HmmmŠ good question.

I think that fixed width support is important for a great many rowkey
constructs cases, so I'd rather see something like losing MIN_VALUE and
keeping fixed width.




On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:


Heya,

Thinking about data types and serialization. I think null support is an
important characteristic for the serialized representations, especially
when considering the compound type. However, doing so in directly
incompatible with fixed-width representations for numerics. For

instance,

if we want to have a fixed-width signed long stored on 8-bytes, where do
you put null? float and double types can cheat a little by folding
negative
and positive NaN's into a single representation (this isn't strictly
correct!), leaving a place to represent null. In the long example case,
the
obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
will allocate an additional encoding which can be used for null. My
experience working with scientific data, however, makes me wince at the
idea.

The variable-width encodings have it a little easier. There's already
enough going on that it's simpler to make room.

Remember, the final goal is to support order-preserving serialization.
This
imposes some limitations on our encoding strategies. For instance, it's
not
enough to simply encode null, it really needs to be encoded as 0x00 so

as

to sort lexicographically earlier than any other value.

What do you think? Any ideas, experiences, etc?

Thanks,
Nick








Re: HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
On Mon, Apr 1, 2013 at 4:31 PM, James Taylor jtay...@salesforce.com wrote:

 From the SQL perspective, handling null is important.


From your perspective, it is critical to support NULLs, even at the expense
of fixed-width encodings at all or supporting representation of a full
range of values. That is, you'd rather be able to represent NULL than -2^31?

On 04/01/2013 01:32 PM, Nick Dimiduk wrote:

 Thanks for the thoughtful response (and code!).

 I'm thinking I will press forward with a base implementation that does not
 support nulls. The idea is to provide an extensible set of interfaces, so
 I
 think this will not box us into a corner later. That is, a mirroring
 package could be implemented that supports null values and accepts
 the relevant trade-offs.

 Thanks,
 Nick

 On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan mcor...@hotpads.com wrote:

  I spent some time this weekend extracting bits of our serialization code
 to
 a public github repo at 
 http://github.com/hotpads/**data-toolshttp://github.com/hotpads/data-tools
 .
   Contributions are welcome - i'm sure we all have this stuff laying
 around.

 You can see I've bumped into the NULL problem in a few places:
 *

 https://github.com/hotpads/**data-tools/blob/master/src/**
 main/java/com/hotpads/data/**primitive/lists/LongArrayList.**javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
 *

 https://github.com/hotpads/**data-tools/blob/master/src/**
 main/java/com/hotpads/data/**types/floats/DoubleByteTool.**javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java

 Looking back, I think my latest opinion on the topic is to reject
 nullability as the rule since it can cause unexpected behavior and
 confusion.  It's cleaner to provide a wrapper class (so both
 LongArrayList
 plus NullableLongArrayList) that explicitly defines the behavior, and
 costs
 a little more in performance.  If the user can't find a pre-made wrapper
 class, it's not very difficult for each user to provide their own
 interpretation of null and check for it themselves.

 If you reject nullability, the question becomes what to do in situations
 where you're implementing existing interfaces that accept nullable
 params.
   The LongArrayList above implements ListLong which requires an
 add(Long)
 method.  In the above implementation I chose to swap nulls with
 Long.MIN_VALUE, however I'm now thinking it best to force the user to
 make
 that swap and then throw IllegalArgumentException if they pass null.


 On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil 
 doug.m...@explorysmedical.com

 wrote:
 HmmmŠ good question.

 I think that fixed width support is important for a great many rowkey
 constructs cases, so I'd rather see something like losing MIN_VALUE and
 keeping fixed width.




 On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  Heya,

 Thinking about data types and serialization. I think null support is an
 important characteristic for the serialized representations, especially
 when considering the compound type. However, doing so in directly
 incompatible with fixed-width representations for numerics. For

 instance,

 if we want to have a fixed-width signed long stored on 8-bytes, where do
 you put null? float and double types can cheat a little by folding
 negative
 and positive NaN's into a single representation (this isn't strictly
 correct!), leaving a place to represent null. In the long example case,
 the
 obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
 This
 will allocate an additional encoding which can be used for null. My
 experience working with scientific data, however, makes me wince at the
 idea.

 The variable-width encodings have it a little easier. There's already
 enough going on that it's simpler to make room.

 Remember, the final goal is to support order-preserving serialization.
 This
 imposes some limitations on our encoding strategies. For instance, it's
 not
 enough to simply encode null, it really needs to be encoded as 0x00 so

 as

 to sort lexicographically earlier than any other value.

 What do you think? Any ideas, experiences, etc?

 Thanks,
 Nick








Re: HBase Types: Explicit Null Support

2013-04-01 Thread James Taylor

On 04/01/2013 04:41 PM, Nick Dimiduk wrote:

On Mon, Apr 1, 2013 at 4:31 PM, James Taylor jtay...@salesforce.com wrote:


 From the SQL perspective, handling null is important.


 From your perspective, it is critical to support NULLs, even at the expense
of fixed-width encodings at all or supporting representation of a full
range of values. That is, you'd rather be able to represent NULL than -2^31?
We've been able to get away with supporting NULL through the absence of 
the value rather than restricting the data range. We haven't had any 
push back on not allowing a fixed width nullable leading row key column. 
Since our variable length DECIMAL supports null and is a superset of the 
fixed width numeric types, users have a reasonable alternative.


I'd rather not restrict the range of values, since it doesn't seem like 
this would be necessary.


On 04/01/2013 01:32 PM, Nick Dimiduk wrote:

Thanks for the thoughtful response (and code!).

I'm thinking I will press forward with a base implementation that does not
support nulls. The idea is to provide an extensible set of interfaces, so
I
think this will not box us into a corner later. That is, a mirroring
package could be implemented that supports null values and accepts
the relevant trade-offs.

Thanks,
Nick

On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan mcor...@hotpads.com wrote:

  I spent some time this weekend extracting bits of our serialization code

to
a public github repo at 
http://github.com/hotpads/**data-toolshttp://github.com/hotpads/data-tools
.
   Contributions are welcome - i'm sure we all have this stuff laying
around.

You can see I've bumped into the NULL problem in a few places:
*

https://github.com/hotpads/**data-tools/blob/master/src/**
main/java/com/hotpads/data/**primitive/lists/LongArrayList.**javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
*

https://github.com/hotpads/**data-tools/blob/master/src/**
main/java/com/hotpads/data/**types/floats/DoubleByteTool.**javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java

Looking back, I think my latest opinion on the topic is to reject
nullability as the rule since it can cause unexpected behavior and
confusion.  It's cleaner to provide a wrapper class (so both
LongArrayList
plus NullableLongArrayList) that explicitly defines the behavior, and
costs
a little more in performance.  If the user can't find a pre-made wrapper
class, it's not very difficult for each user to provide their own
interpretation of null and check for it themselves.

If you reject nullability, the question becomes what to do in situations
where you're implementing existing interfaces that accept nullable
params.
   The LongArrayList above implements ListLong which requires an
add(Long)
method.  In the above implementation I chose to swap nulls with
Long.MIN_VALUE, however I'm now thinking it best to force the user to
make
that swap and then throw IllegalArgumentException if they pass null.


On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil 
doug.m...@explorysmedical.com


wrote:
HmmmŠ good question.

I think that fixed width support is important for a great many rowkey
constructs cases, so I'd rather see something like losing MIN_VALUE and
keeping fixed width.




On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  Heya,

Thinking about data types and serialization. I think null support is an
important characteristic for the serialized representations, especially
when considering the compound type. However, doing so in directly
incompatible with fixed-width representations for numerics. For


instance,
if we want to have a fixed-width signed long stored on 8-bytes, where do

you put null? float and double types can cheat a little by folding
negative
and positive NaN's into a single representation (this isn't strictly
correct!), leaving a place to represent null. In the long example case,
the
obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
This
will allocate an additional encoding which can be used for null. My
experience working with scientific data, however, makes me wince at the
idea.

The variable-width encodings have it a little easier. There's already
enough going on that it's simpler to make room.

Remember, the final goal is to support order-preserving serialization.
This
imposes some limitations on our encoding strategies. For instance, it's
not
enough to simply encode null, it really needs to be encoded as 0x00 so


as
to sort lexicographically earlier than any other value.

What do you think? Any ideas, experiences, etc?

Thanks,
Nick









Re: HBase Types: Explicit Null Support

2013-04-01 Thread Matt Corgan
I generally don't allow nulls in my composite row keys.  Does SQL allow
nulls in the PK?  In the rare case I wanted to do that I might create a
separate format called NullableCInt32 with 5 bytes where the first one
determined null.  It's important to keep the pure types pure.

I have lots of null *values* however, but they're represented by lack of a
qualifier in the Put.  If a row has all null values, I create a dummy
qualifier with a dummy value to make sure the row key gets inserted as it
would in sql.


On Mon, Apr 1, 2013 at 4:49 PM, James Taylor jtay...@salesforce.com wrote:

 On 04/01/2013 04:41 PM, Nick Dimiduk wrote:

 On Mon, Apr 1, 2013 at 4:31 PM, James Taylor jtay...@salesforce.com
 wrote:

   From the SQL perspective, handling null is important.


  From your perspective, it is critical to support NULLs, even at the
 expense
 of fixed-width encodings at all or supporting representation of a full
 range of values. That is, you'd rather be able to represent NULL than
 -2^31?

 We've been able to get away with supporting NULL through the absence of
 the value rather than restricting the data range. We haven't had any push
 back on not allowing a fixed width nullable leading row key column. Since
 our variable length DECIMAL supports null and is a superset of the fixed
 width numeric types, users have a reasonable alternative.

 I'd rather not restrict the range of values, since it doesn't seem like
 this would be necessary.


 On 04/01/2013 01:32 PM, Nick Dimiduk wrote:

 Thanks for the thoughtful response (and code!).

 I'm thinking I will press forward with a base implementation that does
 not
 support nulls. The idea is to provide an extensible set of interfaces,
 so
 I
 think this will not box us into a corner later. That is, a mirroring
 package could be implemented that supports null values and accepts
 the relevant trade-offs.

 Thanks,
 Nick

 On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan mcor...@hotpads.com
 wrote:

   I spent some time this weekend extracting bits of our serialization
 code

 to
 a public github repo at 
 http://github.com/hotpads/data-toolshttp://github.com/hotpads/**data-tools
 http://github.com/**hotpads/data-toolshttp://github.com/hotpads/data-tools
 
 .
Contributions are welcome - i'm sure we all have this stuff laying
 around.

 You can see I've bumped into the NULL problem in a few places:
 *

 https://github.com/hotpads/data-tools/blob/master/src/**https://github.com/hotpads/**data-tools/blob/master/src/**
 main/java/com/hotpads/data/primitive/lists/LongArrayList.java
 https://github.com/**hotpads/data-tools/blob/**
 master/src/main/java/com/**hotpads/data/primitive/lists/**
 LongArrayList.javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
 
 *

 https://github.com/hotpads/data-tools/blob/master/src/**https://github.com/hotpads/**data-tools/blob/master/src/**
 main/java/com/hotpads/data/types/floats/DoubleByteTool.java
 https://github.com/**hotpads/data-tools/blob/**
 master/src/main/java/com/**hotpads/data/types/floats/**
 DoubleByteTool.javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
 

 Looking back, I think my latest opinion on the topic is to reject
 nullability as the rule since it can cause unexpected behavior and
 confusion.  It's cleaner to provide a wrapper class (so both
 LongArrayList
 plus NullableLongArrayList) that explicitly defines the behavior, and
 costs
 a little more in performance.  If the user can't find a pre-made
 wrapper
 class, it's not very difficult for each user to provide their own
 interpretation of null and check for it themselves.

 If you reject nullability, the question becomes what to do in
 situations
 where you're implementing existing interfaces that accept nullable
 params.
The LongArrayList above implements ListLong which requires an
 add(Long)
 method.  In the above implementation I chose to swap nulls with
 Long.MIN_VALUE, however I'm now thinking it best to force the user to
 make
 that swap and then throw IllegalArgumentException if they pass null.


 On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil 
 doug.m...@explorysmedical.com

  wrote:
 HmmmŠ good question.

 I think that fixed width support is important for a great many rowkey
 constructs cases, so I'd rather see something like losing MIN_VALUE
 and
 keeping fixed width.




 On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:

   Heya,

 Thinking about data types and serialization. I think null support is
 an
 important characteristic for the serialized representations,
 especially
 when considering the compound type. However, doing so in directly
 incompatible with fixed-width representations for numerics. For

  instance,
 if we want to have a fixed-width signed long stored on 8-bytes, where
 do

 you put null? float and double types can cheat a little by folding
 negative
 and 

Re: HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
Furthermore, is is more important to support null values than squeeze all
representations into minimum size (4-bytes for int32, c.)?
On Apr 1, 2013 4:41 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 On Mon, Apr 1, 2013 at 4:31 PM, James Taylor jtay...@salesforce.comwrote:

 From the SQL perspective, handling null is important.


 From your perspective, it is critical to support NULLs, even at the
 expense of fixed-width encodings at all or supporting representation of a
 full range of values. That is, you'd rather be able to represent NULL than
 -2^31?

 On 04/01/2013 01:32 PM, Nick Dimiduk wrote:

 Thanks for the thoughtful response (and code!).

 I'm thinking I will press forward with a base implementation that does
 not
 support nulls. The idea is to provide an extensible set of interfaces,
 so I
 think this will not box us into a corner later. That is, a mirroring
 package could be implemented that supports null values and accepts
 the relevant trade-offs.

 Thanks,
 Nick

 On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan mcor...@hotpads.com
 wrote:

  I spent some time this weekend extracting bits of our serialization
 code to
 a public github repo at 
 http://github.com/hotpads/**data-toolshttp://github.com/hotpads/data-tools
 .
   Contributions are welcome - i'm sure we all have this stuff laying
 around.

 You can see I've bumped into the NULL problem in a few places:
 *

 https://github.com/hotpads/**data-tools/blob/master/src/**
 main/java/com/hotpads/data/**primitive/lists/LongArrayList.**javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
 *

 https://github.com/hotpads/**data-tools/blob/master/src/**
 main/java/com/hotpads/data/**types/floats/DoubleByteTool.**javahttps://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java

 Looking back, I think my latest opinion on the topic is to reject
 nullability as the rule since it can cause unexpected behavior and
 confusion.  It's cleaner to provide a wrapper class (so both
 LongArrayList
 plus NullableLongArrayList) that explicitly defines the behavior, and
 costs
 a little more in performance.  If the user can't find a pre-made wrapper
 class, it's not very difficult for each user to provide their own
 interpretation of null and check for it themselves.

 If you reject nullability, the question becomes what to do in situations
 where you're implementing existing interfaces that accept nullable
 params.
   The LongArrayList above implements ListLong which requires an
 add(Long)
 method.  In the above implementation I chose to swap nulls with
 Long.MIN_VALUE, however I'm now thinking it best to force the user to
 make
 that swap and then throw IllegalArgumentException if they pass null.


 On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil 
 doug.m...@explorysmedical.com

 wrote:
 HmmmŠ good question.

 I think that fixed width support is important for a great many rowkey
 constructs cases, so I'd rather see something like losing MIN_VALUE and
 keeping fixed width.




 On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  Heya,

 Thinking about data types and serialization. I think null support is
 an
 important characteristic for the serialized representations,
 especially
 when considering the compound type. However, doing so in directly
 incompatible with fixed-width representations for numerics. For

 instance,

 if we want to have a fixed-width signed long stored on 8-bytes, where
 do
 you put null? float and double types can cheat a little by folding
 negative
 and positive NaN's into a single representation (this isn't strictly
 correct!), leaving a place to represent null. In the long example
 case,
 the
 obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
 This
 will allocate an additional encoding which can be used for null. My
 experience working with scientific data, however, makes me wince at
 the
 idea.

 The variable-width encodings have it a little easier. There's already
 enough going on that it's simpler to make room.

 Remember, the final goal is to support order-preserving serialization.
 This
 imposes some limitations on our encoding strategies. For instance,
 it's
 not
 enough to simply encode null, it really needs to be encoded as 0x00 so

 as

 to sort lexicographically earlier than any other value.

 What do you think? Any ideas, experiences, etc?

 Thanks,
 Nick









Re: HBase Types: Explicit Null Support

2013-04-01 Thread Enis Söztutar
I think having Int32, and NullableInt32 would support minimum overhead, as
well as allowing SQL semantics.


On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 Furthermore, is is more important to support null values than squeeze all
 representations into minimum size (4-bytes for int32, c.)?
 On Apr 1, 2013 4:41 PM, Nick Dimiduk ndimi...@gmail.com wrote:

  On Mon, Apr 1, 2013 at 4:31 PM, James Taylor jtay...@salesforce.com
 wrote:
 
  From the SQL perspective, handling null is important.
 
 
  From your perspective, it is critical to support NULLs, even at the
  expense of fixed-width encodings at all or supporting representation of a
  full range of values. That is, you'd rather be able to represent NULL
 than
  -2^31?
 
  On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
 
  Thanks for the thoughtful response (and code!).
 
  I'm thinking I will press forward with a base implementation that does
  not
  support nulls. The idea is to provide an extensible set of interfaces,
  so I
  think this will not box us into a corner later. That is, a mirroring
  package could be implemented that supports null values and accepts
  the relevant trade-offs.
 
  Thanks,
  Nick
 
  On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan mcor...@hotpads.com
  wrote:
 
   I spent some time this weekend extracting bits of our serialization
  code to
  a public github repo at http://github.com/hotpads/**data-tools
 http://github.com/hotpads/data-tools
  .
Contributions are welcome - i'm sure we all have this stuff laying
  around.
 
  You can see I've bumped into the NULL problem in a few places:
  *
 
  https://github.com/hotpads/**data-tools/blob/master/src/**
  main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java
 https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
 
  *
 
  https://github.com/hotpads/**data-tools/blob/master/src/**
  main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java
 https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
 
 
  Looking back, I think my latest opinion on the topic is to reject
  nullability as the rule since it can cause unexpected behavior and
  confusion.  It's cleaner to provide a wrapper class (so both
  LongArrayList
  plus NullableLongArrayList) that explicitly defines the behavior, and
  costs
  a little more in performance.  If the user can't find a pre-made
 wrapper
  class, it's not very difficult for each user to provide their own
  interpretation of null and check for it themselves.
 
  If you reject nullability, the question becomes what to do in
 situations
  where you're implementing existing interfaces that accept nullable
  params.
The LongArrayList above implements ListLong which requires an
  add(Long)
  method.  In the above implementation I chose to swap nulls with
  Long.MIN_VALUE, however I'm now thinking it best to force the user to
  make
  that swap and then throw IllegalArgumentException if they pass null.
 
 
  On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil 
  doug.m...@explorysmedical.com
 
  wrote:
  HmmmŠ good question.
 
  I think that fixed width support is important for a great many rowkey
  constructs cases, so I'd rather see something like losing MIN_VALUE
 and
  keeping fixed width.
 
 
 
 
  On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
   Heya,
 
  Thinking about data types and serialization. I think null support is
  an
  important characteristic for the serialized representations,
  especially
  when considering the compound type. However, doing so in directly
  incompatible with fixed-width representations for numerics. For
 
  instance,
 
  if we want to have a fixed-width signed long stored on 8-bytes, where
  do
  you put null? float and double types can cheat a little by folding
  negative
  and positive NaN's into a single representation (this isn't strictly
  correct!), leaving a place to represent null. In the long example
  case,
  the
  obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
  This
  will allocate an additional encoding which can be used for null. My
  experience working with scientific data, however, makes me wince at
  the
  idea.
 
  The variable-width encodings have it a little easier. There's
 already
  enough going on that it's simpler to make room.
 
  Remember, the final goal is to support order-preserving
 serialization.
  This
  imposes some limitations on our encoding strategies. For instance,
  it's
  not
  enough to simply encode null, it really needs to be encoded as 0x00
 so
 
  as
 
  to sort lexicographically earlier than any other value.
 
  What do you think? Any ideas, experiences, etc?
 
  Thanks,
  Nick
 
 
 
 
 
 
 



Re: Errors when starting Hbase service

2013-04-01 Thread Ted Yu
Have you checked region server log on server.epicoders.com,60020,1364559783898
around the time NotServingRegionException was seen in master log ?

What version of HBase are you using ?

Thanks


On Mon, Apr 1, 2013 at 9:20 PM, Praveen Bysani praveen.ii...@gmail.comwrote:

 Hi,

 When i try to restart the HBase service i see the following errors in
 my Hbase Master log,


 2013-04-02 03:37:29,713 INFO
 org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized
 2013-04-02 03:37:29,797 INFO
 org.apache.hadoop.hbase.master.ActiveMasterManager: Deleting ZNode for
 /hbase/backup-masters/server.epicoders.com,6,1364873849167 from
 backup master directory
 2013-04-02 03:37:29,816 WARN
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
 /hbase/backup-masters/server.epicoders.com,6,1364873849167 already
 deleted, and this is not a retry
 2013-04-02 03:37:29,816 INFO
 org.apache.hadoop.hbase.master.ActiveMasterManager:
 Master=server.epicoders.com,6,1364873849167
 2013-04-02 03:37:31,830 WARN org.apache.hadoop.conf.Configuration:
 fs.default.name is deprecated. Instead, use fs.defaultFS
 2013-04-02 03:37:31,848 INFO
 org.apache.hadoop.hbase.master.SplitLogManager: found 0 orphan tasks
 and 0 rescan nodes
 2013-04-02 03:37:32,349 WARN org.apache.hadoop.conf.Configuration:
 hadoop.native.lib is deprecated. Instead, use io.native.lib.available
 2013-04-02 03:37:32,774 INFO org.apache.hadoop.hbase.master.HMaster:
 Server active/primary master;
 server.epicoders.com,6,1364873849167, sessionid=0x13daf9ed2b90086,
 cluster-up flag was=false
 2013-04-02 03:37:32,817 INFO
 org.apache.hadoop.hbase.master.snapshot.SnapshotManager: Snapshot
 feature is not enabled, missing log and hfile cleaners.
 2013-04-02 03:37:32,846 INFO
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
 /hbase/online-snapshot/acquired already exists and this is not a retry
 2013-04-02 03:37:32,856 INFO
 org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all
 procedure znodes: /hbase/online-snapshot/acquired
 /hbase/online-snapshot/reached /hbase/online-snapshot/abort
 2013-04-02 03:37:33,095 INFO org.mortbay.log: Logging to
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
 org.mortbay.log.Slf4jLog
 2013-04-02 03:37:33,175 INFO org.apache.hadoop.http.HttpServer: Added
 global filter 'safety'
 (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
 2013-04-02 03:37:33,178 INFO org.apache.hadoop.http.HttpServer: Added
 filter static_user_filter
 (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
 to context master
 2013-04-02 03:37:33,178 INFO org.apache.hadoop.http.HttpServer: Added
 filter static_user_filter
 (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
 to context static
 2013-04-02 03:37:33,200 INFO org.apache.hadoop.http.HttpServer: Jetty
 bound to port 60010
 2013-04-02 03:37:33,200 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
 2013-04-02 03:37:33,880 INFO org.mortbay.log: Started
 SelectChannelConnector@0.0.0.0:60010
 2013-04-02 03:37:33,881 INFO
 org.apache.hadoop.hbase.master.ServerManager: Waiting for region
 servers count to settle; currently checked in 0, slept for 0 ms,
 expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
 interval of 1500 ms.
 2013-04-02 03:37:34,079 INFO
 org.apache.hadoop.hbase.master.ServerManager: Registering
 server=test3.jayeson.com.sg,60020,1364873839936
 2013-04-02 03:37:34,084 INFO
 org.apache.hadoop.hbase.master.ServerManager: Registering
 server=test2.jayeson.com.sg,60020,1364873841105
 2013-04-02 03:37:34,085 INFO
 org.apache.hadoop.hbase.master.ServerManager: Registering
 server=server.epicoders.com,60020,1364873849637
 2013-04-02 03:37:34,091 INFO
 org.apache.hadoop.hbase.master.ServerManager: Waiting for region
 servers count to settle; currently checked in 3, slept for 210 ms,
 expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
 interval of 1500 ms.
 2013-04-02 03:37:34,103 WARN org.apache.hadoop.conf.Configuration:
 fs.default.name is deprecated. Instead, use fs.defaultFS
 2013-04-02 03:37:35,634 INFO
 org.apache.hadoop.hbase.master.ServerManager: Finished waiting for
 region servers count to settle; checked in 3, slept for 1752 ms,
 expecting minimum of 1, maximum of 2147483647, master is running.
 2013-04-02 03:37:35,639 INFO
 org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
 hdfs://
 server.epicoders.com:8020/hbase/.logs/server.epicoders.com,60020,1364873849637
 belongs to an existing region server
 2013-04-02 03:37:35,639 INFO
 org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
 hdfs://
 server.epicoders.com:8020/hbase/.logs/test2.jayeson.com.sg,60020,1364873841105
 belongs to an existing region server
 2013-04-02 03:37:35,640 INFO
 org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
 hdfs://
 server.epicoders.com:8020/hbase/.logs/test3.jayeson.com.sg,60020,1364873839936
 belongs to an existing region server
 2013-04-02 03:37:35,640 

Re: Read thruput

2013-04-01 Thread lars hofhansl
If you are concerned about latencies  50ms you should disable Nagle's.

In hbase-site.xml:

  property
    namehbase.ipc.client.tcpnodelay/name
    valuetrue/value
  /property
  property
    nameipc.server.tcpnodelay/name
    valuetrue/value
  /property


You might get a further latency improvement if you do that same for HDFS:
In hdfs-site.xml:
property
  nameipc.server.tcpnodelay/name
  valuetrue/value
/property
property
  nameipc.client.tcpnodelay/name
  valuetrue/value
/property

Also (as other's have pointed out) you need to carefully control your garbage 
collections.
Watch the HDFS replication count (3 by default, which does not make any sense 
with only 2 DNs), but since your reading that should make no difference.


-- Lars




 From: Vibhav Mundra mun...@gmail.com
To: user@hbase.apache.org 
Sent: Monday, April 1, 2013 3:09 AM
Subject: Read thruput
 
Hi All,

I am trying to use Hbase for real-time data retrieval with a timeout of 50
ms.

I am using 2 machines as datanode and regionservers,
and one machine as a master for hadoop and Hbase.

But I am able to fire only 3000 queries per sec and 10% of them are timing
out.
The database has 60 million rows.

Are these figure okie, or I am missing something.
I have used the scanner caching to be equal to one, because for each time
we are fetching a single row only.

Here are the various configurations:

*Our schema
*{NAME = 'mytable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', COMPRESSION =
'GZ', VERSIONS = '1', TTL = '2147483647', MIN_VERSIONS = '0', KEE
P_DELETED_CELLS = 'false', BLOCKSIZE = '8192', ENCODE_ON_DISK = 'true',
IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}

*Configuration*
1 Machine having both hbase and hadoop master
2 machines having both region server node and datanode
total 285 region servers

*Machine Level Optimizations:*
a)No of file descriptors is 100(ulimit -n gives 100)
b)Increase the read-ahead value to 4096
c)Added noatime,nodiratime to the disks

*Hadoop Optimizations:*
dfs.datanode.max.xcievers = 4096
dfs.block.size = 33554432
dfs.datanode.handler.count = 256
io.file.buffer.size = 65536
hadoop data is split on 4 directories, so that different disks are being
accessed

*Hbase Optimizations*:

hbase.client.scanner.caching=1  #We have specifcally added this, as we
return always one row.
hbase.regionserver.handler.count=3200
hfile.block.cache.size=0.35
hbase.hregion.memstore.mslab.enabled=true
hfile.min.blocksize.size=16384
hfile.min.blocksize.size=4
hbase.hstore.blockingStoreFiles=200
hbase.regionserver.optionallogflushinterval=6
hbase.hregion.majorcompaction=0
hbase.hstore.compaction.max=100
hbase.hstore.compactionThreshold=100

*Hbase-GC
*-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
*Hadoop-GC*
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC

-Vibhav

Re: Errors when starting Hbase service

2013-04-01 Thread Praveen Bysani
Hi,

I have setup hbase using cloudera, the version it shows 'HBase
0.94.2-cdh4.2.0'. In this case both server and regionserver are the same
machine. But during other instances logs show the hostname of another
regionserver with similar errors. Just a note that the master has different
system time from the regionservers, is it an issue ?

I didn't find any errors while starting the service but there is an error
while shutting down the service. Following is the log from the region
server on the same machine during shutdown,

2013-04-02 03:36:54,390 INFO
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager:
Stopping RegionServerSnapshotManager gracefully.
2013-04-02 03:36:54,396 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 2
regions to close
2013-04-02 03:36:54,397 INFO
org.apache.hadoop.hbase.regionserver.Store: Closed info
2013-04-02 03:36:54,398 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
-ROOT-,,0.70236052
2013-04-02 03:36:54,426 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
filter type for
hdfs://server.epicoders.com:8020/hbase/.META./1028785192/.tmp/1104b09fbaeb41829c2493875d7475c1:
CompoundBloomFilterWriter
2013-04-02 03:36:54,558 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and
NO DeleteFamily was added to HFile
(hdfs://server.epicoders.com:8020/hbase/.META./1028785192/.tmp/1104b09fbaeb41829c2493875d7475c1)
2013-04-02 03:36:54,558 INFO
org.apache.hadoop.hbase.regionserver.Store: Flushed ,
sequenceid=90407, memsize=2.3k, into tmp file
hdfs://server.epicoders.com:8020/hbase/.META./1028785192/.tmp/1104b09fbaeb41829c2493875d7475c1
2013-04-02 03:36:54,612 INFO
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://server.epicoders.com:8020/hbase/.META./1028785192/info/1104b09fbaeb41829c2493875d7475c1,
entries=8, sequenceid=90407, filesize=1.6k
2013-04-02 03:36:54,669 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush
of ~2.3k/2320, currentsize=0/0 for region .META.,,1.1028785192 in
273ms, sequenceid=90407, compaction requested=false
2013-04-02 03:36:54,676 INFO
org.apache.hadoop.hbase.regionserver.Store: Closed info
2013-04-02 03:36:54,676 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
.META.,,1.1028785192
2013-04-02 03:36:54,799 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
server.epicoders.com,60020,1364559783898; all regions closed.
2013-04-02 03:36:54,800 INFO
org.apache.hadoop.hbase.regionserver.wal.HLog:
regionserver60020.logSyncer exiting
2013-04-02 03:36:54,861 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing
leases
2013-04-02 03:36:54,862 INFO
org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed
leases
2013-04-02 03:36:54,864 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/server.epicoders.com,60020,1364559783898
2013-04-02 03:36:54,864 INFO
org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before
retry #1...
2013-04-02 03:36:56,868 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/server.epicoders.com,60020,1364559783898
2013-04-02 03:36:56,871 INFO
org.apache.hadoop.hbase.util.RetryCounter: Sleeping 4000ms before
retry #2...
2013-04-02 03:36:59,051 INFO
org.apache.hadoop.hbase.regionserver.Leases:
regionserver60020.leaseChecker closing leases
2013-04-02 03:36:59,054 INFO
org.apache.hadoop.hbase.regionserver.Leases:
regionserver60020.leaseChecker closed leases
2013-04-02 03:37:00,872 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/server.epicoders.com,60020,1364559783898
2013-04-02 03:37:00,873 INFO
org.apache.hadoop.hbase.util.RetryCounter: Sleeping 8000ms before
retry #3...
2013-04-02 03:37:08,875 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/server.epicoders.com,60020,1364559783898
2013-04-02 03:37:08,876 ERROR
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper
delete failed after 3 retries
2013-04-02 03:37:08,876 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my
ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/server.epicoders.com,60020,1364559783898
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)