Does 1.2 support p999 metrics?

2016-06-27 Thread Tianying Chang
Hi, We want to expose p999 metrics for latency, e.g Get, Put, RPC latency. It seems it only support up to p99. Is p999 only supported after 1.3? If so, any patch to port into 1.2 to enable this? Thanks Tian-Ying

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

2016-06-01 Thread Tianying Chang
since March 31, so some bug that caused region enter into bad state... Unfortunately, we don't have DEBUG enabled and that is the last region that has the issue, it is hard to figure out what is the bug that caused the bad state... Thanks Tian-Ying On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <t

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

2016-05-31 Thread Tianying Chang
and will report back. Thanks Tian-Ying On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote: > On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tych...@gmail.com> wrote: > > > Yes, it is 94.26. By a quick glance, I didn't see any put that is older > &g

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

2016-05-27 Thread Tianying Chang
as expected. Also we noticed, the same region replicated at the slave side is totally normal, i.e. at 20+G On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote: > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <tych...@gmail.com> wrote: > > > Hi, > > > &g

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

2016-05-27 Thread Tianying Chang
. Thanks Tian-Ying On Fri, May 27, 2016 at 2:54 PM, Frank Luo <j...@merkleinc.com> wrote: > What if you manually trigger major-compact on that particular region? Does > it run and the delete markers removed? > > -Original Message- > From: Tianying Chang [mailto:tych...@gmai

Major compaction cannot remove deleted rows until the region is split. Strange!

2016-05-27 Thread Tianying Chang
Hi, We saw a very strange case in one of our production cluster. A couple regions cannot get their deleted rows or delete marker removed even after major compaction. However when the region triggered split (we set 100G for auto split), the deletion worked. The 100G region becomes two 10G daughter

Re: "Show All RPC Handler Tasks" on RS webUI stop showing any information after heavy load

2016-01-21 Thread Tianying Chang
OK, I found the bug in TaskMonitor, and have opened a JIRA with a fix in it. https://issues.apache.org/jira/browse/HBASE-15155 Thanks Tian-Ying On Mon, Jan 18, 2016 at 9:27 PM, Tianying Chang <tych...@gmail.com> wrote: > Hi, > > It seems it was fine when the cluster i

"Show All RPC Handler Tasks" on RS webUI stop showing any information after heavy load

2016-01-18 Thread Tianying Chang
Hi, It seems it was fine when the cluster is first started, but after running under some heavy load for a while, it stops showing any of those RPC handler tasks. We have seen this on our newly upgraded 94.26, and also 1.0. It works fine in 0.94.7 before. Other people also reported same issue on

Upgrade application's unit test setup from 0.94 MiniCluster to 1.x

2015-11-02 Thread Tianying Chang
Hi, We currently have 0.94 hbase mini cluster setup to run unit test for our application. As we upgrade to 1.x, we need to start 1.0 mini hbase cluster so that we can run application unit test again HBase 1.0. In the current pom, we need dependency of *hbase-tests* shown as below, to start a 1.x

Re: clone table(72TB data) failed with socket timeout

2015-06-18 Thread Tianying Chang
) at org.apache.hadoop.hbase.master.snapshot.CloneSnapshotHandler.handleCreateHdfsRegions(CloneSnapshotHandler.java:109) ... 6 more On Wed, Jun 17, 2015 at 10:15 PM, Tianying Chang tych...@gmail.com wrote: Hi, I am trying to clone a table from a snapshot. The snapshot is reported to be healthy. However, clone table failed with socket time error, shown

Re: Could not clone a snapshot, complaining the table already exist, although the table does not exist

2015-06-18 Thread Tianying Chang
, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} On Thu, Jun 18, 2015 at 12:38 PM, Tianying Chang tych...@gmail.com wrote: Hi, I am trying to trying to clone a table from a snapshot, but it always fail with below error. That table does

Could not clone a snapshot, complaining the table already exist, although the table does not exist

2015-06-18 Thread Tianying Chang
Hi, I am trying to trying to clone a table from a snapshot, but it always fail with below error. That table does not exist, although I was trying to clone the same table, but failed. I feel that a reference to that table still exist somewhere. Any hint what is going wrong? Thanks Tian-Ying

clone table(72TB data) failed with socket timeout

2015-06-17 Thread Tianying Chang
Hi, I am trying to clone a table from a snapshot. The snapshot is reported to be healthy. However, clone table failed with socket time error, shown as below. BTW, the table is huge with 72T data. Anyone know why? Is it because the size is too big so that the some default timeout is not enough?

Re: What cause hfile saved under /hbase/.archive besides snapshot?

2015-06-05 Thread Tianying Chang
used. -Vlad On Fri, Jun 5, 2015 at 12:25 PM, Tianying Chang tych...@gmail.com wrote: Hi, I found that there are many hfiles saved under /hbase/.archive/table/. I know when there is a snapshot taken, the hfile might have to be saved under /hbase/.archive. But when we are not even taking

What cause hfile saved under /hbase/.archive besides snapshot?

2015-06-05 Thread Tianying Chang
Hi, I found that there are many hfiles saved under /hbase/.archive/table/. I know when there is a snapshot taken, the hfile might have to be saved under /hbase/.archive. But when we are not even taking snapshot, I saw lots of hfiles under /hbase/.archive. I noticed that if I run a major

Re: Failed to take snapshot due to some region directory is not found

2015-05-20 Thread Tianying Chang
(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Thanks Tian-Ying On Wed, May 20, 2015 at 9:26 AM, Tianying Chang tych

Re: Failed to take snapshot due to some region directory is not found

2015-05-20 Thread Tianying Chang
. -- Cloudera, Inc. On Tue, May 19, 2015 at 2:23 PM, Tianying Chang tych...@gmail.com wrote: Sure, Esteban. Where is a good place to upload the log? On Tue, May 19, 2015 at 2:01 PM, Esteban Gutierrez este...@cloudera.com wrote: The latest log is very interesting Tianying

Failed to take snapshot due to some region directory is not found

2015-05-19 Thread Tianying Chang
Hi, We have a cluster that used to be able to take snapshot. But recently, one table failed due to the error below. Other tables on the same clusters are fine. Any idea what could go wrong? Is the table not healthy? But I run hbase hbck, it reports cluster healthy. BTW, we are running 94.7, so

Re: Failed to take snapshot due to some region directory is not found

2015-05-19 Thread Tianying Chang
, esteban. -- Cloudera, Inc. On Mon, May 18, 2015 at 11:12 PM, Tianying Chang tych...@gmail.com wrote: Hi, We have a cluster that used to be able to take snapshot. But recently, one table failed due to the error below. Other tables on the same clusters are fine. Any idea what

Re: Failed to take snapshot due to some region directory is not found

2015-05-19 Thread Tianying Chang
, so we still need the verification // step. LOG.debug(Starting region operation on + region); On Tue, May 19, 2015 at 11:26 AM, Tianying Chang tych...@gmail.com wrote: Hi, Esteban, There is no region split in this cluster, since we put the region size upper bound to be really high

Re: Failed to take snapshot due to some region directory is not found

2015-05-19 Thread Tianying Chang
) at org.apache.hadoop.hbase.ipc.Invocation.toString(Invocation.java:152) at org.apache.hadoop.hbase.ipc.HBaseServer$Call.toString(HBaseServer.java:304) Matteo On Tue, May 19, 2015 at 11:35 AM, Tianying Chang tych...@gmail.com wrote: Actually, I find it does not even print out the debug

Re: Failed to take snapshot due to some region directory is not found

2015-05-19 Thread Tianying Chang
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: cancelling 0 tasks for snapshot On Tue, May 19, 2015 at 1:30 PM, Tianying Chang tych...@gmail.com wrote: Matteo We are using hdfs2.0 + HBase94.7. I saw this ArrayIndexOutOfBoundsException: 2 error also. What does that mean? BTW

Re: Failed to take snapshot due to some region directory is not found

2015-05-19 Thread Tianying Chang
, Tianying Chang tych...@gmail.com wrote: Matteo By looking at the DEBUG log at RS side, it seems to me that no online regions were pickedup. So it seems to me that this call returns 0 regions. But I am not sure how that happens. Is there anyway to verify this? involvedRegions

Dump hfile to verify hbase delete order

2014-09-03 Thread Tianying Chang
Hi, I did a small test, want to verify that DeleteMarker is placed before any Put of the same row/version, for example: put 't1', 'cf1', 'v1', 100 delete 't1', 'cf1', 100 flush 't1' But when I dump the hfile being flushed out with command below: hbase org.apache.hadoop.hbase.io.hfile.HFile -p

Re: Dump hfile to verify hbase delete order

2014-09-03 Thread Tianying Chang
Ok , I figured it out, when flush the memstore out to HFile, it use the scanner which masks the Put that is earlier than the delete, so the HFile will not contain the Put. Thanks Tian-Ying On Wed, Sep 3, 2014 at 11:27 AM, Tianying Chang tych...@gmail.com wrote: Hi, I did a small test, want

Hbase replication perf issue: a single region server cannot receive edits from two sources at the same time?

2014-08-05 Thread Tianying Chang
Hi, We are seeing some performance issue on one of our write heavy cluster, and trying to find out the root cause. One confusion I have during investigate is that I found in ReplicationSink.java, it says this which seems wrong: ** * This class is responsible for replicating the edits coming *

WALPlayer kills many RS when play large number of WALs

2014-07-22 Thread Tianying Chang
Hi I was running WALPlayer that output HFile for future bulkload. There are 6200 hlogs, and the total size is about 400G. The mapreduce job finished. But I saw two bad things: 1. More than half of RS died. I checked the syslog, it seems they are killed by OOM. They also have very high CPU spike

Re: WALPlayer kills many RS when play large number of WALs

2014-07-22 Thread Tianying Chang
log snippet prior to (and including) the OOME ? Are you using 0.94 release ? Cheers On Tue, Jul 22, 2014 at 8:15 AM, Tianying Chang tych...@gmail.com wrote: Hi I was running WALPlayer that output HFile for future bulkload. There are 6200 hlogs, and the total size is about 400G

Re: WALPlayer kills many RS when play large number of WALs

2014-07-22 Thread Tianying Chang
of mapreduce jobs * task heap setting). Reduce the allowed mapreduce task concurrency. On Tue, Jul 22, 2014 at 8:15 AM, Tianying Chang tych...@gmail.com wrote: Hi I was running WALPlayer that output HFile for future bulkload. There are 6200 hlogs, and the total size is about

Re: WALPlayer kills many RS when play large number of WALs

2014-07-22 Thread Tianying Chang
be interesting. On Tue, Jul 22, 2014 at 9:58 AM, Tianying Chang tych...@gmail.com wrote: Andrew Thanks for your answer! I think you are right. The node has only 15G memory. We configured it to run RS with 12G. And then we configured 4 mapper and 4 reducer on each node, each to use 2G

export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
Hi, When I export large table with 460+ regions, I saw the exportSnapshot job fail sometime (not all the time). The error of the map task is below: But I verified the file highlighted below, it does exist. Smaller table seems always pass. Any idea? Is it because it is too big and get session

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
you give us the hbase and hadoop releases you're using ? Can you check namenode log around the time LeaseExpiredException was encountered ? Cheers On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang tych...@gmail.com wrote: Hi, When I export large table with 460+ regions, I saw

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
, Tianying Chang tych...@gmail.com wrote: we are using Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several snapshot related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle in exportSnapshot) I found when the LeaseExpiredException first reported, that file indeed

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
that can happen if your MR job is stuck in someway (slow machine or similar) and it is not writing within the lease timeout Matteo On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang tych...@gmail.com wrote: we are using Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
, 2014 at 10:17 AM, Tianying Chang tych...@gmail.com wrote: yes, I am using the bandwidth throttle feature. The export job of this table actually succeed for its first run. When I rerun it (for my robust testing) it seems never pass. I am wondering if it has some werid state (I did clean up

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
$ClientNamenodeProtocol$ Thanks Tian-Ying On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu yuzhih...@gmail.com wrote: Tianying: Have you checked audit log on namenode for deletion event corresponding to the files involved in LeaseExpiredException ? Cheers On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang tych

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
of the big table always succeed. But then the later run always fail with these LeaseExpiredException. Smaller table has no problem no matter how many times I re-run. Thanks Tian-Ying On Wed, Apr 30, 2014 at 2:24 PM, Tianying Chang tych...@gmail.com wrote: Ted, it seems it is due to the Jira

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
for the busy state of your machines Matteo On Wed, Apr 30, 2014 at 2:55 PM, Tianying Chang tych...@gmail.com wrote: I think it is not directly caused by the throttle. On the 2nd run on the non-throttle jar, the LeaseExpiredException shows up again(for big file). So it does seem like

Re: export snapshot fail sometime due to LeaseExpiredException

2014-04-30 Thread Tianying Chang
email. Can you apply the patch and try again ? Cheers On Wed, Apr 30, 2014 at 3:31 PM, Tianying Chang tych...@gmail.com wrote: Actually, my testing on a 90G table always succeed, never fail. The failed one is a production table which has about 400G and 460 regions. The weird thing

Re: Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

2014-04-18 Thread Tianying Chang
:39 PM, Tianying Chang tych...@gmail.com wrote: Cool. Thanks! Just to dig deeper, is this because BloomFilter is part of Meta, and Meta block always cached no matter what? Or it is because the BloomFilter is in the upper level of the searchTree in the code path I pasted? I guess

Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

2014-04-16 Thread Tianying Chang
Hi, We have a use case where some data are mostly random read, so it polluted cache and caused big GC. It is better to turn off the block cache for those data. So we are going to call setCacheBlocks(false) for those get(). We know that the index will be still cached based on below code path, so

Re: Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

2014-04-16 Thread Tianying Chang
; } On Wed, Apr 16, 2014 at 4:59 PM, Ted Yu yuzhih...@gmail.com wrote: bq. it is always cached on read even when per-family/per-query cacheBlocks is turned off. True. On Wed, Apr 16, 2014 at 4:41 PM, Tianying Chang tych...@gmail.com wrote: Hi, We have a use case where some data

no-flush based snapshot policy?

2014-03-25 Thread Tianying Chang
Hi, I need a new snapshot policy which sits in between the disabled and flushed version. So, basically: I cannot disable the table, but I also don't need the snapshot to be that consistent where all RS coordinated to flush the region before taking the snapshot.

Re: no-flush based snapshot policy?

2014-03-25 Thread Tianying Chang
Sorry, sent hit accidentally, please ignore this one. will send again after done. On Tue, Mar 25, 2014 at 2:08 PM, Tianying Chang tych...@gmail.com wrote: Hi, I need a new snapshot policy which sits in between the disabled and flushed version. So, basically: I cannot disable the table

Re: no-flush based snapshot policy?

2014-03-25 Thread Tianying Chang
, Mar 25, 2014 at 2:08 PM, Tianying Chang tych...@gmail.com wrote: Hi, I need a new snapshot policy which sits in between the disabled and flushed version. So, basically: I cannot disable the table, but I also don't need the snapshot to be that consistent where all RS coordinated to flush

Re: no-flush based snapshot policy?

2014-03-25 Thread Tianying Chang
a static conf property that you read on the RS and you are done. Matteo On Tue, Mar 25, 2014 at 2:38 PM, Tianying Chang tych...@gmail.com wrote: Hi, I need a new snapshot policy. Basically, I cannot disable the table, but I also don't need the snapshot to be that consistent where all

Re: WALPlayer?

2014-02-11 Thread Tianying Chang
the WAL log rolled.(I can see the count of hlogs increased by 10) But still, after running WALPlayer, the rowCount is not changed. Any idea? Can WALPlayer work with Snapshot? Thanks Tian-Ying On Fri, Feb 7, 2014 at 10:31 AM, Tianying Chang tych...@gmail.com wrote: Hi, Lars Sure. I will come back

Re: WALPlayer?

2014-02-11 Thread Tianying Chang
are not present... you probably want to replay the entries from the original TestTable. I think that you can specify a mapping.. something like WalPlayer TestTable TestTable-cloned Matteo On Tue, Feb 11, 2014 at 7:37 PM, Tianying Chang tych...@gmail.com wrote: I am trying to use snapshot+WALPlayer

Re: WALPlayer?

2014-02-07 Thread Tianying Chang
From: Tianying Chang tych...@gmail.com To: user@hbase.apache.org Sent: Thursday, February 6, 2014 10:06 PM Subject: Re: WALPlayer? Never mind. Should use hbase command. :) Thanks Tian-Ying On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com wrote: Hi, folks

WALPlayer?

2014-02-06 Thread Tianying Chang
Hi, folks I want to try the WALPlayer. But it complains not found. Am I running it the wrong way? Thanks Tian-Ying hadoop jar /tmp/hbase-0.94.17-SNAPSHOT.jar WALPlayer Unknown program 'WALPlayer' chosen. Valid program names are: CellCounter: Count cells in HBase table completebulkload:

Re: WALPlayer?

2014-02-06 Thread Tianying Chang
Never mind. Should use hbase command. :) Thanks Tian-Ying On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com wrote: Hi, folks I want to try the WALPlayer. But it complains not found. Am I running it the wrong way? Thanks Tian-Ying hadoop jar /tmp/hbase-0.94.17

is hbase-6580 in 94.7?

2014-01-27 Thread Tianying Chang
Hi, I trying to use Hconnection.getTable() instead of HTablePool since it is being deprecated. But we are using HBase 94.7 in our production. Seems it is only after 94.11? https://issues.apache.org/jira/browse/HBASE-6580 Thanks Tian-Ying

Tall good with Scans, Wide good with Gets???

2014-01-22 Thread Tianying Chang
Hi, I watched this youtube video http://www.youtube.com/watch?v=_HLoH_PgrLk by Lars George. It is really good one! I just have one thing still cannot understand. It said Tall is good for Scan, Wide is good for Get. My understanding is that Scan and Get is using the same code underlying. It first

Re: Tall good with Scans, Wide good with Gets???

2014-01-22 Thread Tianying Chang
yuzhih...@gmail.com wrote: See http://search-hadoop.com/m/nAAad2wRi03/Is+get+a+private+case+of+scansubj=Re+Is+get+a+private+case+of+scan+ Cheers On Wed, Jan 22, 2014 at 10:09 AM, Tianying Chang tych...@gmail.com wrote: Hi, I watched this youtube video http://www.youtube.com/watch?v

Re: Tall good with Scans, Wide good with Gets???

2014-01-22 Thread Tianying Chang
Thanks Ted, This jira helped me to understand that claim much better now. On Wed, Jan 22, 2014 at 1:10 PM, Ted Yu yuzhih...@gmail.com wrote: For the follow-on question of #1, see HBASE-9488 : small scan. Cheers On Wed, Jan 22, 2014 at 10:30 AM, Tianying Chang tych...@gmail.com wrote

Re: HBase Thrift2 does not support createTable() API anymore?

2014-01-21 Thread Tianying Chang
/hbase-dev/201212.mbox/%3ccae9mebbh7v1phsbegtepbxg1h5drbs+ydvyo0akdr1d4jce...@mail.gmail.com%3E But on the other hand, there is already Jira for adding the HBaseAdmin into Thrift2 https://issues.apache.org/jira/browse/HBASE-8820 Thanks Tian-Ying On Mon, Jan 20, 2014 at 9:18 PM, Tianying Chang

Re: HBase Thrift2 does not support createTable() API anymore?

2014-01-21 Thread Tianying Chang
: http://search-hadoop.com/m/O9OjiuXFJQ1/hbase+thrift2+larssubj=Thrift+2+Update Cheers On Tue, Jan 21, 2014 at 9:30 AM, Tianying Chang tych...@gmail.com wrote: Hi, Ram I find this post by Tim Sell, He said the HBaseAdmin is intentionally not implemented. In that thread, Jimmy Xiang

Re: HBase Thrift2 does not support createTable() API anymore?

2014-01-21 Thread Tianying Chang
-hadoop.com/m/O9OjiuXFJQ1/hbase+thrift2+larssubj=Thrift+2+Update ' using other sites. Cheers On Tue, Jan 21, 2014 at 10:16 AM, Tianying Chang tych...@gmail.com wrote: Ted, thanks for the link. It seems the site is down? I cannot access it. Can you still access it? If so, can

old question regarding wide vs tall schema design

2014-01-21 Thread Tianying Chang
Hi, All I know this is a question that has been asked a lot... Here is our user case that seems neither wide nor tall can win obviously. We have a table. Here are the two schema design of the simplified version: Wide table-- rowKey: domainName. Column1: date (which expand to multiple

HBase Thrift2 does not support createTable() API anymore?

2014-01-20 Thread Tianying Chang
Hi, It seems some API that is supported by Thrift are not in the Thrift2 anymore, e.g. createTable, deleteTable, getTableRegions, and so on? Basically, these are those API that supported by Thrift2. How can I createTable through Thrift2? Am I missing something here? Thanks Tian-Ying print

Re: HBase Thrift2 does not support createTable() API anymore?

2014-01-20 Thread Tianying Chang
mentioned belong to HBaseAdmin Please take a look at hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java where you can find examples for table creation, etc. Cheers On Mon, Jan 20, 2014 at 4:56 PM, Tianying Chang tych...@gmail.com wrote: Hi

Re: HBase Thrift2 does not support createTable() API anymore?

2014-01-20 Thread Tianying Chang
that support. We can file a JIRA too for the same. Regards Ram On Tue, Jan 21, 2014 at 7:26 AM, Tianying Chang tych...@gmail.com wrote: Hi, Thanks Ted for the link. HappyBase! Does this mean that the native HBase Thrift2 does not support those API like createTable anymore? I

RE: Hbase 0.96 and Hadoop 2.2

2013-10-23 Thread Tianying Chang
What is your HBase version and Hadoop version? There is a RPC break change in hadoop 2.2. As a workaround, I removed the hadoop hadoop-common-2.2.0.2.0.6.0-68.jar and hadoop-hdfs-2.2.0.2.0.6.0-68.jar from my Hbase/lib and let it use the one in hadoop path, then this error is gone. Thanks

Hbase 95 on Hadoop 2. No master znode created in zookeeper

2013-10-10 Thread Tianying Chang
Hi, I found that there is no master znode under /hbase in zookeeper. Actually, the /hbase directory is not even created after just start the master (Based on Log and webUI, the master is up fine). After I bring up a RS, the /hbase directory and some other znode like /hbase/backup-masters,

RE: Hbase 95 on Hadoop 2. No master znode created in zookeeper

2013-10-10 Thread Tianying Chang
, replication, splitWAL, recovering-regions, rs] Cheers On Thu, Oct 10, 2013 at 10:46 AM, Tianying Chang tich...@ebaysf.com wrote: Hi, I found that there is no master znode under /hbase in zookeeper. Actually, the /hbase directory is not even created after just start the master (Based on Log

RE: Hbase 95 on Hadoop 2. No master znode created in zookeeper

2013-10-10 Thread Tianying Chang
created in zookeeper On Thu, Oct 10, 2013 at 10:59 AM, Tianying Chang tich...@ebaysf.com wrote: Not really. Because we are deploying hadoop2+hbase95 in our production cluster. Need to confirm 95 is working fine. Every announcement related to the 0.95.x series came with this disclaimer: Be aware

RE: Error starting Hbase 95 in Hadoop 2.0

2013-10-07 Thread Tianying Chang
:44 PM, Tianying Chang tich...@ebaysf.com wrote: Hi, The hdfs and zookeeper are both up running. When I try to start the HBase, it failed with this error below. I even tried to change the security.group.mpapping to ShellBasedUnixGroupsMappingWithFallback, still not working. Anyone know what

Error starting Hbase 95 in Hadoop 2.0

2013-10-04 Thread Tianying Chang
Hi, The hdfs and zookeeper are both up running. When I try to start the HBase, it failed with this error below. I even tried to change the security.group.mpapping to ShellBasedUnixGroupsMappingWithFallback, still not working. Anyone know what could went wrong? Caused by:

openTSDB lose large amount of data when the client are writing

2013-09-19 Thread Tianying Chang
Hi, I have a customer who use openTSDB. Recently we found that only less than 10% data are written, rest are are lost. By checking the RS log, there are many row lock related issues, like below. It seems large amount of write to tsdb that need row lock caused the problem. Anyone else see

RE: deploy saleforce phoenix coprocessor to hbase/lib??

2013-09-11 Thread Tianying Chang
Phoenix tables are created, though, they wouldn't have this jar path. FYI, we're looking into modifying our install procedure to do the above (see https://github.com/forcedotcom/phoenix/issues/216), if folks are interested in contributing. Thanks, James On Sep 10, 2013, at 2:41 PM, Tianying

deploy saleforce phoenix coprocessor to hbase/lib??

2013-09-10 Thread Tianying Chang
Hi, Since this is not a hbase system level jar, instead, it is more like user code, should we deploy it under hbase/lib? It seems we can use alter to add the coprocessor for a particular user table. So I can put the jar file any place that is accessible, e.g. hdfs:/myPath? My customer

RE: split region

2013-07-15 Thread Tianying Chang
I think the feature of using encoded name for split is only available in 94, not in 92? In 92, you have to use the full name as you first tried. Unfortunately there is bug for certain name. -Original Message- From: Alex Levin [mailto:ale...@gmail.com] Sent: Thursday, July 11, 2013

How to turn off WAL from hbase shell or config file?

2013-06-19 Thread Tianying Chang
Hi, I am trying to do some performance testing, and want to test when the WAL is turned off. I know there is API writeToWAL(false) to do this, but if I want to just change the setting and use the performanceEvaluation tool to run the test with WAL off. Is there a way in HBase shell, or a

RE: Inconsistent Table HBCK

2013-05-29 Thread Tianying Chang
don't think there is already a JIRA for that. The idea was to open a new one and ask for HBCK to be able to fix that. Can you do that? Do you have an easy way to reproduce your issue? Like by manually creating files in HDFS or something like that? JM 2013/5/22 Tianying Chang tich...@ebaysf.com

RE: Inconsistent Table HBCK

2013-05-22 Thread Tianying Chang
Hi, Jean What is the jira #? Thanks Tian-Ying -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, May 22, 2013 7:57 AM To: user@hbase.apache.org Subject: Re: Inconsistent Table HBCK Thanks for the feedback Jay. I helped someone who faced the

RE: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-20 Thread Tianying Chang
at these apis if they solve the problem for you. RowLocks are prone to more thread contentions and some deadlock situations when there are lot of threads waiting for the same row lock. Regards Ram On Fri, May 17, 2013 at 12:11 AM, Tianying Chang tich...@ebaysf.com wrote: FYI, below I quoted

RE: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-16 Thread Tianying Chang
the row lock explicitly ? Using HTable.lockRow? Regards Ram On Thu, May 16, 2013 at 10:46 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.comwrote: Hi, When our

RE: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-16 Thread Tianying Chang
when trying to obtain lock for RowKey Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.com wrote: Hi, When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in the RS logs as below. I checked

Cannot run selected test under 0.94, is OK under trunk/95 though.

2013-03-26 Thread Tianying Chang
Hi, I can run either full test or a selected test under either trunk or 95. But after I checkout to branch 94, I found I cannot run a selected test anymore. It can still run the full test suite though. I am adding a new unit test, I don't want to have run the whole test suite each time. Anyone

RE: Cannot run selected test under 0.94, is OK under trunk/95 though.

2013-03-26 Thread Tianying Chang
switches -PrunAllTests or -PlocalTests On Tue, Mar 26, 2013 at 9:52 PM, Tianying Chang tich...@ebaysf.com wrote: Hi, I can run either full test or a selected test under either trunk or 95. But after I checkout to branch 94, I found I cannot run a selected test anymore. It can still run

RE: status of bin/rename_table.rb

2013-02-21 Thread Tianying Chang
Our customers need to use rename_table to solve an issue in our production cluster. Since 92 does not have it, we copied it from CDH3. However it does not work. I then found several issues with it, and fixed them. We used it in our production cluster to rename some very big tables. Even when

RE: status of bin/rename_table.rb

2013-02-21 Thread Tianying Chang
earlier today for another purpose, and it had a few hitches but nothing unfixable by hbck -fix. On Thu, Feb 21, 2013 at 11:36 PM, Tianying Chang tich...@ebaysf.com wrote: Our customers need to use rename_table to solve an issue in our production cluster. Since 92 does not have it, we copied

RE: Empty a table

2013-02-05 Thread Tianying Chang
You can achieve this goal by deleting all the HFiles of this table. Remember just Hfile, but not the region folders. We did it before with a simple script that loop through the Hfiles, very easy. Thanks Tian-Ying -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent:

AsynchBase client holds stale dead region server for long time even after the META has already been update.

2013-01-25 Thread Tianying Chang
Hi One machine crashed in our cluster. After 3 minutes, the master detect it and re-assign the regions to other region servers. The regions are back online on other RS within one minute. But the asynchbase client still hold old dead regionserver for 50 minutes and cause data loss. We have to

RE: AsynchBase client holds stale dead region server for long time even after the META has already been update.

2013-01-25 Thread Tianying Chang
already been update. Tianying: I moved user@ to Cc. There is a google group for asynchbase. Please subscribe to that group. Can you clarify the version of asynchbase you're using ? Cheers On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang tich...@ebaysf.com wrote: Hi One machine crashed in our

RE: AsynchBase client holds stale dead region server for long time even after the META has already been update.

2013-01-25 Thread Tianying Chang
Thanks Marcos. Can I file a bug there? Or at the googleGroup? -Original Message- From: Marcos Ortiz [mailto:mlor...@uci.cu] Sent: Friday, January 25, 2013 1:30 PM To: user@hbase.apache.org Cc: Tianying Chang; Async HBase Subject: Re: AsynchBase client holds stale dead region server

RE: AsynchBase client holds stale dead region server for long time even after the META has already been update.

2013-01-25 Thread Tianying Chang
at 10:58 AM, Ted Yu yuzhih...@gmail.com wrote: Tianying: I moved user@ to Cc. There is a google group for asynchbase. Please subscribe to that group. Can you clarify the version of asynchbase you're using ? Cheers On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang tich...@ebaysf.comwrote