Hi,
We have a newly setup a cluster of 5 nodes, each with 16 GB rams. We use
HBase 0.90.0 on top of Hadoop from CDH3. When testing HBase under heavy load
generated bu YCSB, we consistently see region servers dying silently,
without any logs or exceptions (not even in system logs). We couldn't
Are the nodes themselves dying or the region server processes?
What JVM version?
On Wed, Feb 16, 2011 at 12:46 AM, Ryan Rawson ryano...@gmail.com wrote:
are your disks filling? are you running into swap? vmstat can help
diagnose this.
What is 'heavy load'. I pushed a 3 node hbase cluster to
Hi Vishal,
These are DEBUG level messages and are from the block cache, there is
nothing wrong with that. Can you explain more what you do and see?
Lars
On Wed, Feb 16, 2011 at 4:24 AM, Vishal Kapoor
vishal.kapoor...@gmail.com wrote:
all was working fine and suddenly I see a lot of logs like
Hi,
On 02/16/2011, at 4:51 PM, 陈加俊 wrote:
How to limit the number of logs that producted by DailyRollingFileAppender ?
I find the logs are exceeding disk apace limit.
If you're using log4j.properties coming with HBase, uncomment MaxBackupIndex
property ant change its value.
# 30-day
Hi
In my cluster need use higher availability future, but the slave hmaster
always print logs as:
2011-02-16 19:23:21,158 INFO
org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized
2011-02-16 19:23:21,158 DEBUG org.apache.hadoop.hbase.master.HMaster: HMaster
started in backup
Did you increase the max open files on your system (in
/etc/security/limits.conf) ?
2011/2/16 Enis Soztutar enis.soz.nu...@gmail.com
Hi,
We have a newly setup a cluster of 5 nodes, each with 16 GB rams. We use
HBase 0.90.0 on top of Hadoop from CDH3. When testing HBase under heavy
load
Another question, has anyone virtualized the master/namenode on a
multi-node VM pool/cluster (say a multi-node Xen pool), thereby raising
reliability of that node above single server hardware reliability?
On 2/16/11 9:12 AM, Joseph Coleman joe.cole...@infinitecampus.com
wrote:
I know I
Not for HDFS.
On Wed, Feb 16, 2011 at 7:17 AM, John Buchanan
john.bucha...@infinitecampus.com wrote:
Another question, has anyone virtualized the master/namenode on a
multi-node VM pool/cluster (say a multi-node Xen pool), thereby raising
reliability of that node above single server
Lars,
I am still working on pseudo distributed.
hadoop-0.20.2+737/
and hbase-0.90.0 with the hadoop jar from the hadoop install.
I have a LIVE_RAW_TABLE table, which gets values from a live system
I go through each row of that table and get the row ids of two reference
tables from it.
TABLE_A
That looks right. Try killing current master and see the below message change.
St.Ack
On Wed, Feb 16, 2011 at 4:21 AM, Gaojinchao gaojinc...@huawei.com wrote:
Hi
In my cluster need use higher availability future, but the slave hmaster
always print logs as:
2011-02-16 19:23:21,158 INFO
Thanks for the link that was a very interesting article and I am looking at
going with that approach and carve and NFS volume from my NetApp filer for the
transactions logs between the primary and backup node. The question is any idea
on how large I need to make the NFS volume? Also were I can
Hi there-
As was described in the HBase chapter in the Hadoop book by Tom White, you
don't want to insert a lot of data at one time with incrementing keys.
-MM-DD would seem to me to be a reasonable lead-portion of a key - as long
as you aren't trying to insert everything in time-order
Please check the archives, there have been some threads about this recently.
On Feb 16, 2011 9:51 AM, Venkatesh vramanatha...@aol.com wrote:
If I have to store multiple events (time-based) for multiple users,
- either I could create a unique row key for every event (or)
- use user id as the
First, loading into 3 families is currently a bad idea and is bound to
be inefficient, here's the reason why:
https://issues.apache.org/jira/browse/HBASE-3149
Those log lines mean that your scanning of the first table is
generating a log of block cache churn. When setting up the Map, set
your
Thanks, I'm storing log files and need to scan the tables by date and vendor.
Since the vendor is limited to at most 16 characters I can put a padded version
in the front followed by the date (vendor**|DD-MM-|other date) and
I can still scan by setting the start row to
Hi Otis,
Excellent reflexion, unfortunately I don't think anyone benchmarked it
to give a definitive answer.
One thing I'm sure of is that worse than screwing up the OS cache, it
also screws up the block cache! But this is the price to pay to clear
up old versions and regroup all store files
See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[][])
For illustration of why ts alone is a bad key for sorted hbase, see
http://hbase.apache.org/schema.html#d0e2139
St.Ack
On Wed, Feb 16, 2011 at
One of my coworker is reminding me that major compactions do have the
well know side effect of slowing down a busy system.
Are we able to assert that the performance degradation is due to the
OS cache being invalidated? Or is it because of all the disk IO being
used? Or because of the block
One of my coworker is reminding me that major compactions do have the
well know side effect of slowing down a busy system.
I think where this is going is the system IO cache problem could be
solved with something like DirectIOLinuxDirectory:
https://issues.apache.org/jira/browse/LUCENE-2500 Of
Originally sent to just Stack and now sent to the list.
If I assign a row key a random value the writes will be distributed and
populating HBase will be faster. On the other hand if my scans will bring back
blocks of data (vendor by date) where each block of data can have tens of
thousands of
Good on you Peter.
St.Ack
On Wed, Feb 16, 2011 at 1:58 PM, Peter Haidinyak phaidin...@local.com wrote:
Originally sent to just Stack and now sent to the list.
If I assign a row key a random value the writes will be distributed and
populating HBase will be faster. On the other hand if my
From time to time I run into issues where disabling a table pretty
much hangs. I am simply calling the disableTable method fo HBaseAdmin.
The table has ~ 500 regions with default region file size. I couldn't
tell anything abnormal from the master's log. When I click on the
region from Master's web
I don't understand... is having the same qualifier a hard requirement?
Worst case you could have a prefix.
J-D
On Wed, Feb 16, 2011 at 3:29 PM, Vishal Kapoor
vishal.kapoor...@gmail.com wrote:
J-D,
I also should mention that my data distribution in the three families are
1:1:1
I have three
Actually I wanted to disable the table so I can drop it. It would be
nice to be able to disable the table without flushing memstore. It's
not possible in 0.20.6 is it?
On Wed, Feb 16, 2011 at 2:30 PM, Jean-Daniel Cryans jdcry...@apache.org wrote:
To disable a region, it's memstore must first be
thanks J-D. for all your help, I will combine the three families and
re-baseline the performance.
but I was just wondering if I was using the family as they were suppose to
be used or not.
the data in these three families are different, one of them is live feed and
the two other two are master
Actually I never thought of having a special case for that... and I
don't see any jira about it. Would you mind opening a new one for
that, I think it's a good idea for those times when you're developing
something and you want to iterate fast.
On the other hand, it's a pretty destructive feature
It's best to have different families for data of different nature and
when you usually don't read/write them together. For sure it shouldn't
slow you down as much as it does (because of HBASE-3149), but given
the current situation it's hard to recommend multiple families.
J-D
On Wed, Feb 16,
I want to limit the number of log files for a DailyRollingFileAppender. I
search for a parameter like maxBackupIndex (in RollingFileAppender) but can
not find one. Is there really no way to limit the number of log-Files?
What is the strategy to ensure that the (number) of log files do not
Did you try MaxBackupIndex? Didn't it work?
I found this in Log4j wiki, but I don't know if this class has been merged into
log4j trunk.
http://wiki.apache.org/logging-log4j/DailyRollingFileAppender
I've change the DailyRollingFileAppender to support the MaxBackupIndex, this
is the class to
There is a patch that causes us to evict the block cache on close of
hfile, and populate the block cache during compaction write out. This
is included in 0.90.
So that helps. Fixing VFS issues is quite a bit longer term, since
the on-wire format of HDFS rpc is kind of fixed, petitioning for
There is a patch that causes us to evict the block cache on close of
hfile, and populate the block cache during compaction write out. This
is included in 0.90.
That's good!
HDFS-347, which is a huge
clear win but still no plans to include it in any hadoop version.
Why's that? It seems to
I can't say, I think there just isn't a push for it since mapreduce
would not benefit from it as much nas HBase. Futhermore the patch
proposals have to deal with HDFS security, and the one I'm testing
just does not worry about security (and hence is a security hole
itself).
HDFS is just a slow
Another interesting idea is
the concept of re-warming the cache after a compaction
That's probably the best approach for now. O_DIRECT would only be
used for reading the old files, though in lieu of that we'd still
need/want to warm the new file? Eg, the old files are probably still
being
Is all regions info aways in memory whatever the region is busy or not?
Zhou Shuaifeng(Frank)
-
This e-mail and its attachments contain confidential
Thanks Everyone. I used UUID, Thats working fine
Regards
Jason
On Wed, Feb 16, 2011 at 12:28 AM, Jason urg...@gmail.com wrote:
Or based on int/long:
ID[i]=ID[i-1]+N
ID[0]=n
N - number of mappers or reducers in whi ids are generated
n - task id
Sent from my iPhone 4
On Feb 15,
35 matches
Mail list logo