Dear Manish and Jean-Daniel,
After starting DFS (/opt/hadoop/bin/start-dfs.sh), I got the following
daemons after tying jps.
5212 Jps
5150 SecondaryNameNode
4932 DataNode
4737 NameNode
Then, I started the HBase (/opt/hbase/bin/start-hbase.sh). The following
daemons were available.
5797 Jps
Jean-Daniel,
I changed dfs.data.dir and dfs.name.dir to new paths in the hdfs-site.xml.
I really cannot figure out why the HBase/Hadoop got a problem after a
couple of days of shutting down. If I use it frequently, no such a master
problem happens.
Each time, I have to reinstall not only
Bing,
As per my experience on the configuration I can list down some points one of
which may be your solution.
- first and foremost don't store your service metadata into system tmp
directory because it may get cleaned up in every start and you loose all your
job tracker, datanode
Wouldn't that mean having the NAS attached to all of the nodes in the cluster?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 26, 2012, at 11:07 PM, Stack st...@duboce.net wrote:
On Mon, Mar 26, 2012 at 4:31 PM, Ted Tuttle ted.tut...@mentacapital.com
wrote:
Is
Hello,
I think the problem is not that. If I don't put this env
variable $HBASE_CONF_DIR, everything starts correctly, I don't think I need
to put it. My problem is that the map reduce is not executing in parallel.
if I ask:
Configuration config = HBaseConfiguration.create();
Dear Manish,
I appreciate so much for your replies!
The system tmp directory is changed to anther location in my hdfs-site.xml.
If I ran $HADOOP_HOME/bin/start-all.sh, all of the services were listed,
including job tracker and task tracker.
10211 SecondaryNameNode
10634 Jps
9992
I think there is a lot of stuff in this and the situation has changed a
bit so I'd like to summarize the current situation and verify a few points:
Our current environment:
- CDH 4b1: hdfs 0.23 and hbase 0.92
- separate master and namenode, 64gb, 24 cores each, colocating with
zookeepers(third
Thank you very much!
On Tue, Mar 27, 2012 at 6:54 PM, Ted Yu yuzhih...@gmail.com wrote:
Index 20 corresponds to RS_ZK_REGION_FAILED_OPEN which was added by:
HBASE-5490 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum
list in 0.90 EventHandler
(Ram)
As of now,
Hi,
We have region server sporadically stopping under load due supposedly to
errors writing to HDFS. Things like:
2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error while
syncing
java.io.IOException: All datanodes 10.1.104.10:50010 are bad. Aborting..
It's happening with a
R
- Original Message -
From: Bing Li [mailto:lbl...@gmail.com]
Sent: Wednesday, March 28, 2012 01:32 AM
To: user@hbase.apache.org user@hbase.apache.org; hbase-u...@hadoop.apache.org
hbase-u...@hadoop.apache.org
Subject: Re: Starting Abnormally After Shutting Down For Some Time
Juhani,
We've been working on some similar performance testing on our 50 node
cluster running 0.92.1 and CDH3U3.
We were looking mostly at reads, but observed similar behavior. HBase
wasn't particularly busy, but we couldn't make it go faster.
Some debugging later, we found that many (sometimes
Which version of HDFS and HBase are you using?
When the problem happens, can you access the HDFS, for example, from hadoop dfs?
Thanks,
Jimmy
On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner e...@gigya.com wrote:
Hi,
We have region server sporadically stopping under load due supposedly to
I'm noticing a failure on about .0001% of Gets, wherein instead of the
actual row I request, I get the next logical row.
For example, I create a Get with this key: \x00\x00\xB8\xB210291 and
instead get back the row key: \x00\x00\xB8\xB2103 .
This happens reliably when we run large jobs against
Hi Jimmy,
HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I had
the same problem with 0.90.4
Hadoop 0.20.2 from Cloudera CDH3u1
This failure happens during large M/R jobs, I have 10 servers and usually
no more than 1 would fail like this, sometimes none.
One thing worth
On Wed, Mar 28, 2012 at 8:09 AM, Eran Kutner e...@gigya.com wrote:
Hi Jimmy,
HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I had
the same problem with 0.90.4
Hadoop 0.20.2 from Cloudera CDH3u1
Can you upgrade to CDH3u3 Eran? I don't remember if CDH3u1 had
support for
Eran,
For 0.90.7 SNAPSHOT, set hbase.regionserver.logroll.errors.tolerated
to 0 (default). This will help RS survive transient HLog sync
failures (with local DN) by retrying a few times before the RS decides
to shut itself down.
Also worth investigating if you had too much IO load/etc. on the
Thanks Stack and Harsh, I'll try both suggestions and update the list with
the results.
-eran
On Wed, Mar 28, 2012 at 17:21, Harsh J ha...@cloudera.com wrote:
Eran,
For 0.90.7 SNAPSHOT, set hbase.regionserver.logroll.errors.tolerated
to 0 (default). This will help RS survive transient
Any chance we can see what happened before that too? Usually you
should see a lot more HDFS spam before getting that all the datanodes
are bad.
J-D
On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner e...@gigya.com wrote:
Hi,
We have region server sporadically stopping under load due supposedly to
On Wed, Mar 28, 2012 at 7:49 AM, Whitney Sorenson wsoren...@hubspot.com wrote:
This happens reliably when we run large jobs against our cluster which
perform many reads and writes, but it does not always happen on the
same keys.
Interesting. Anything you can figure about a particular key if
I don't see any prior HDFS issues in the 15 minutes before this exception.
The logs on the datanode reported as problematic are clean as well.
However, I now see the log is full of errors like this:
2012-03-28 00:15:05,358 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Can you look even further? Like a day?
J-D
On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner e...@gigya.com wrote:
I don't see any prior HDFS issues in the 15 minutes before this exception.
The logs on the datanode reported as problematic are clean as well.
However, I now see the log is full of
Eran:
The error indicated some zookeeper related issue.
Do you see KeeperException after the Error log ?
I searched 90 codebase but couldn't find the exact log phrase:
zhihyu$ find src/main -name '*.java' -exec grep getting node's version in
CLOSI {} \; -print
zhihyu$ find src/main -name
On Wed, Mar 28, 2012 at 12:59 AM, Michel Segel
michael_se...@hotmail.com wrote:
Wouldn't that mean having the NAS attached to all of the nodes in the cluster?
Yes. That was the presumption.
St.Ack
On Tue, Mar 27, 2012 at 5:56 PM, Sindy sindyban...@gmail.com wrote:
Hadoop 1.0.1
HBase 0.90.2
You mean 0.92.0?
2012-03-27 22:04:06,607 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 60 on 60120 caught:
java.nio.channels.ClosedChannelException
Client timed out. Check its
On Tue, Mar 27, 2012 at 2:36 PM, Bryan Beaudreault
bbeaudrea...@hubspot.com wrote:
I imagine it isn't a great idea to create a ton of scans
(1 for each row), which is the only way I can think to do the above with
what we have.
You want to step through some set of rows in lock-step? That is,
On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron buckl...@oclc.org wrote:
For us, setting these two, got rid of all of the 20 and 40 ms response
times and dropped the average response time we measured from HBase by
more than half. Plus, we can push HBase a lot harder.
That had an effect on
Thanks Stack, that's correct. It is kind of hard to describe, though I
guess it's easiest to think of it as a 2d array where the 2nd dimension is
sorted.
I think your idea would be doable, too. I'm going to try testing them both
and see how well they perform. Luckily I'm not TOO concerned
Stack,
We're about 80% random read and 20% random write. So, that would have been the
mix that we were running.
We'll try a test with Nagel On and then Nagel off, random write only, later
this afternoon and see if the same pattern emerges.
Ron
-Original Message-
From:
Then you should have an error in the master logs.
If not, it worths checking that the master the region servers speak to
the same ZK...
As it's hbase related, I redirect the question to hbase user mailing list
(hadoop common is in bcc).
On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman
On Wed, Mar 28, 2012 at 7:27 PM, Bing Li lbl...@gmail.com wrote:
Dear all,
I found some configuration information was saved in /tmp in my system. So
when some of the information is lost, the HBase cannot be started normally.
But in my system, I have tried to change the HDFS directory to
hmmm... I couldn't find it either, so I've looked at the history of that
file and sure enough a few check-ins back it had that message.
I have no idea how something like this could happen. I know I had some
merge issues when I first got the latest version and built that project but
I've then
Dear Peter,
When I just started the Ubuntu machine, there was nothing in /tmp.
After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the
following files were under /tmp. Do you think anything wrong? Thanks!
libing@greatfreeweb:/tmp$ ls -alrt
total 112
drwxr-xr-x 22 root root
Hi folks-
The HBase RefGuide has been updated on the website.
Doug Meil
Chief Software Architect, Explorys
doug.m...@explorys.com
The one thing I wanted to point out in this latest update was that I broke the
Case Studies into a separate chapter (from the single entry that I put in
Troubleshooting a few weeks ago).
http://hbase.apache.org/book.html#casestudies
Several people have posted links to some great research, so
On Wed, Mar 28, 2012 at 9:53 PM, Bing Li lbl...@gmail.com wrote:
Dear Peter,
When I just started the Ubuntu machine, there was nothing in /tmp.
After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the
following files were under /tmp. Do you think anything wrong? Thanks!
Bing:
Your pid file location can be setup via hbase-env.sh; default is /tmp ...
# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids
On Wed, Mar 28, 2012 at 3:04 PM, Peter Vandenabeele
pe...@vandenabeele.com wrote:
On Wed, Mar 28, 2012 at 9:53
Ron,
thanks for sharing those settings. Unfortunately they didn't help with
our read throughput, but every little bit helps.
Another suspicious thing that has come up is with the network... While
overall throughput has been verified to be able to go much higher than
the tax hbase is putting
As you said, the amount of errors and drops you are seeing are very small
compared to your overall traffic, so I doubt that is a significant
contributor to the throughput problems you are seeing.
- Dave
On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly
juhani_conno...@cyberagent.co.jp wrote:
On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly
juhani_conno...@cyberagent.co.jp wrote:
Since we haven't heard anything on expected throughput we're downgrading our
hdfs back to 0.20.2, I'd be curious to hear how other people do with 0.23
and the throughput they're getting.
We don't have
39 matches
Mail list logo