On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned selek...@yahoo.com wrote:
I was studying the OpenTSDB example, where they also prefix the row keys with
event id.
I further modified my row keys to have this -
eventid uuid -mm-dd
The uuid is fairly unique and random.
Is appending a uuid to
Hi geoff,
Since hbase balances not at table but at cluster basis it may happen that all
the regions for one table are located at the same region server. The reason for
this may be the way hbase does table splits. If a region exceeds the configured
maximum size the region is split into two, but
Hello guys,
Well.. I am using 12 node hbase cluster.
I can see all the nodes running on Hbase Web-UI.
But when I am running hbase hbck , I am getting the following output :
hadoop@ub13:/usr/local/hadoop$ hbase hbck
11/06/08 15:30:52 INFO zookeeper.ZooKeeper: Client
Re: With respect to Doug's posts, you can't do a multi-get off the bat
That's an assumption, but you're entitled to your opinion.
-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Monday, June 06, 2011 10:08 PM
To: user@hbase.apache.org
Subject: RE: How to
If it is possible I think any slides or even a video would be very
interesting to some of us that can't travel. I, for one, would love to hear
how you do it.
Thanks!
On Tue, Jun 7, 2011 at 6:07 PM, Jack Levin magn...@gmail.com wrote:
Hey Guys, I plan to do a tech talk here at ImageShack, on
+1 to Matt's opinion (if possible?).
I am interested in your use case, sounds very impressive by the stats you
gave. You said 1000 tables?
Looking forward to see what optimizations/config tweaks you had to do to
cope up with your read/write requirements.
Thanks,
Himanshu
On Wed, Jun 8, 2011 at
From: Ted Dunning tdunn...@maprtech.com
Lots of people are moving towards more spindles per box to
increase IOP/s
This is particular important for cases where the working
set gets pushed out of memory.
Indeed.
Our spec is more like 12x 500 GB SATA disks, to push IOPS and more evenly
We have certain tables with under 10 rows, one under 200 rows and one with
1,000,000 rows. We have found out that having a copy/cache on each node is
EXTREMELY fast for our batch processing since these copies of data are local
AND in-memory. The issue I am struggling with is the best way to
Hi,
Given the tableName, startKey and endKey for a region how do I get hold of
the encodedName?
We have code for identifying overlapping regions that outputs triples of the
form tableName, startKey and endKey for each region, but it looks like the
Merge command (we're using 0.20.6) requires the
Hey,
Could anyone give me suggestion for Hadoop/HBase upgrade? We're
currently using apache hadoop 0.20.2 + hbase 0.20.3 + zookeeper-3.2.2.
Has anyone done with latest stable version of hadoop-0.20.203.0rc1 +
Hbase 0.90.2, and will Hbase 0.90.2 have compatible issue with
hadoop-0.20.203.0rc1?
On Wed, Jun 8, 2011 at 9:22 AM, James Hammerton
james.hammer...@mendeley.com wrote:
Given the tableName, startKey and endKey for a region how do I get hold of
the encodedName?
I suppose it depends on the context.
If reading .META., then if you deserialize the info:regioninfo into an
Thanks, Stack.
The context is that we have a script, find_overlapping_regions.rb at:
https://github.com/Mendeley/hbase-scripts/blob/master/find_overlapping_regions.rb
We'd ideally like to feed the results into another script (to be written)
that will call org.apache.hbase.util.Merge. I've been
Do you have check_meta.rb in 0.20.6 (I don't remember? I think you
do). Start with that?
Otherwise, here:
keys = wanted_table.getStartEndKeys
In 0.20.6 can you get HRegionInfos instead of start keys? That'd be
more useful. They would have the encoded name.
We'd ideally like to feed the
Hi,
I've checked /usr/lib/hbase/bin and it doesn't have check_meta.rb.
Also, HTable doesn't have getHRegionInfos in 0.20.6.
Regards,
James
On Wed, Jun 8, 2011 at 5:46 PM, Stack st...@duboce.net wrote:
Do you have check_meta.rb in 0.20.6 (I don't remember? I think you
do). Start with
On Wed, Jun 8, 2011 at 9:25 AM, Zhong, Sheng sheng.zh...@searshc.com wrote:
Could anyone give me suggestion for Hadoop/HBase upgrade? We're
currently using apache hadoop 0.20.2 + hbase 0.20.3 + zookeeper-3.2.2.
Has anyone done with latest stable version of hadoop-0.20.203.0rc1 +
Hbase 0.90.2,
Pull it in. You'll have to massage a little but rather than do the
indirect HTable.getStartKeys (which turns around and reads meta), read
.META. directly and get the HRIs yourself.
St.Ack
On Wed, Jun 8, 2011 at 9:51 AM, James Hammerton
james.hammer...@mendeley.com wrote:
Hi,
I've checked
On Wed, Jun 8, 2011 at 10:01 AM, James Hammerton
james.hammer...@mendeley.com wrote:
Thanks Stack,
I take it you mean get hold of check_meta.rb from a recent version and alter
it to find the HRIs?
Yes. Alter it to run in 0.20.6.
St.Ack
A problem that will be fixed in 0.90.4 is that once hbck finds one
issue, all checks that follow emit 'INCONSISTENCY'. A quick perusal
of the below has it that hbck is not able to reach a server. Can you
check into that? Its using an IP, rather than hostname. Why is that?
ips in the
Hi,
We're trying to come up with right strategy for backing up HBase tables.
Assumption is that sizes of tables will not grow beyond few hundred GB.
Currently, we're employing exports (writing onto HDFS of another cluster
directly), but is taking too long (~5 hours to export ~5GB of data). Are
J-D,
Thanks for the info. I copied the appropriate hadoop jar file to the lib
directory (and renamed the original one). I wasn't able to figure out why
zookeeper wasn't running on my master server so I launched zookeeper directly
and set HBASE_MANAGES_ZK to false. (And since I am running
Looks like you need to copy to hbase a commons config jar; this
version of hadoop seems to depend on it:
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
And you are clear that this version of hadoop does not have
sync/append so hbase will lose data on crash.
St.Ack
On Wed, Jun 8, 2011 at 9:00 AM, Hiller, Dean x66079
dean.hil...@broadridge.com wrote:
We have certain tables with under 10 rows, one under 200 rows and one with
1,000,000 rows. We have found out that having a copy/cache on each node is
EXTREMELY fast for our batch processing since these
On Wed, Jun 8, 2011 at 1:44 AM, bijieshan bijies...@huawei.com wrote:
Thanks Suraj.
Yes, It's a better method. For I haven't test on that.
So use HTablePool, it seems we haven't need to delete Zk connections
manually? Is that correct?
Yes.
St.Ack
I'd like to clarify, again what I'm trying to do and why I still think it's
the best way to do it.
I want to join two large tables, I'm assuming, and this is the key to the
efficiency of this method, that: 1) I'm getting a lot of data from table A,
something which is close enough top a full table
Yes, thanks it worked! Have no idea how I didn't come across the method! Thank
you for the tip!
In trunk this behavior has been improved.
Load balancer would move the youngest region off heavily loaded region
server.
See http://zhihongyu.blogspot.com/2011/04/load-balancer-in-hbase-090.html
I am thinking of creating a new policy for region assignment at cluster
startup which assigns regions
On Wed, Jun 8, 2011 at 12:50 PM, Ted Yu yuzhih...@gmail.com wrote:
I am thinking of creating a new policy for region assignment at cluster
startup which assigns regions from each table in round-robin fashion.
Don't we want to retain assignments on startup since that will ensure
greatest
Let's make a toy example to see if we can capture all of the edge conditions:
Table A
---
Key1 joinVal_1
Key2 joinVal_2
Key3 joinVal_1
Table B
---
Key4 joinVal_1
Key5 joinVal_3
Key6 joinVal_2
Now, assume that we have a mapper that takes two values, one row from A, and
one row from B.
On Wed, Jun 8, 2011 at 12:40 AM, tsuna tsuna...@gmail.com wrote:
On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned selek...@yahoo.com wrote:
I was studying the OpenTSDB example, where they also prefix the row keys
with
event id.
I further modified my row keys to have this -
eventid uuid
If I understand the history correctly, round-robin was used in .89, but
retains is the policy for .90+.
My 2-cents is that if/when region-shuffling is required, I'd rather do that
with another utility and keep that out of cluster startup.
-Original Message-
From: saint@gmail.com
Unless I am mistaken... get() requires a row key, right?
And you can join tables on column data which isn't in the row key, right?
So how do you do a get()? :-)
Sure there is more than one way to skin a cat. But if you want to be
efficient... You will create a set of unique keys based on the
I believe this is what Eran is suggesting:
Table A
---
Row1 (has joinVal_1)
Row2 (has joinVal_2)
Row3 (has joinVal_1)
Table B
---
Row4 (has joinVal_1)
Row5 (has joinVal_3)
Row6 (has joinVal_2)
Mapper receives a list of input rows (union of both input tables in any
order), and produces
Can you afford some down time? If so, you could minor compact, disable
the table, distcp, and then enable the table.
-Joey
On Wed, Jun 8, 2011 at 1:22 PM, Manoj Murumkar manoj.murum...@gmail.com wrote:
Hi,
We're trying to come up with right strategy for backing up HBase tables.
Assumption is
Thank you for the explanation, I think I understand the suggestion now. I
completely agree with you that this would be effective for cases that you can
do the join of the sorted values in memory.
A small tweak would make this more generic and effective for any size. If you
had two separate
We are on hbase 0.90 and using hbase for a while to perform high volume data
lookup using hbase client (no map-reduce involved).
Recently we observed that our get latencies keep increasing over the
period (and eventually flatten out at higher value) and if we restart hbase
server, latencies go
Hi,
Where can I find the targeted release date of 0.92.0?
Thanks.
Ming
We are trying to do this online as downtime is not an option. Good point,
nonetheless.
On Jun 8, 2011 3:48 PM, Joey Echeverria j...@cloudera.com wrote:
Can you afford some down time? If so, you could minor compact, disable
the table, distcp, and then enable the table.
-Joey
On Wed, Jun 8,
There is this post about HBase backup options
http://blog.sematext.com/2011/03/11/hbase-backup-options/ . I hope it helps.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Manoj
I wouldn't rely on any dates. :) I'd look at the number of remaining open JIRA
issues with that target version.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/
- Original Message
From: Ma, Ming
Jim, I'd be interested in hearing your experience with Whirr when you try
it. I've been testing it the last couple of days and I haven't been able to
get the out-of-the box hadoop recipe to work when it cames up (the namenode
doesn't have any datanodes configured although they are all up and
The assumption was that regions were not evenly distributed prior to
restarting.
If they were, user wouldn't select this policy.
We can this policy effective only once - retain assignment is selected
following this new policy.
Of course the dynamic portion of load balancer needs to select the
Hi there-
Summary comment:
1) Preference
Several people in this thread have suggested approaches (map-side memory join,
multi-get, temp files), all of which have merit and have advantages in certain
situations. Kudos to the dist-list for chiming in. The right approach
depends on the
How should we set the log level for production ?
Do anyone have some experience?
I want to use information.
Thanks, I have seen it. Once I verify a viable solution, I will update this
thread.
On Jun 8, 2011 5:57 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
wrote:
There is this post about HBase backup options
http://blog.sematext.com/2011/03/11/hbase-backup-options/ . I hope it
helps.
Otis
In the conf/log4j.properties
St.Ack
On Wed, Jun 8, 2011 at 9:02 PM, Gaojinchao gaojinc...@huawei.com wrote:
How should we set the log level for production ?
Do anyone have some experience?
I want to use information.
Hi..
I guess the problem is one of my regionserver is having entry of localhost
in /etc/hosts file.
My log is saying that
*2011-06-08 15:24:27,588 INFO org.apache.hadoop.hbase.*
*regionserver.HRegionServer: Serving as ub8,60020,1307526863668, RPC
listening on /127.0.0.1:60020,
Hi,
We are running a 5 machine Hbase cluster. We have noticed that whenever an
HQuorum fails in one machine, the entire application that is running on
HBase crashes. Is there anything to do about this?
--
With Regards,
Jr.
Is there anyway to add a new HQuorum to the cluster dynamically?
--
With Regards,
Jr.
What is an HQuorum?
If you mean a regionserver then possibly you application is attempting to get
data that was on a region hosted by the failed regionserver and in that case
you need to make sure you application can deal the connection failure and wait
for the the regions to be reassigned to
Hi,
Thanks for your reply. So does HBase automatically reassign to another
regionserver or do we have to do it manually.
On Thu, Jun 9, 2011 at 10:18 AM, Chris Tarnas c...@email.com wrote:
What is an HQuorum?
If you mean a regionserver then possibly you application is attempting to
get data
On Wed, Jun 8, 2011 at 10:10 PM, James Ram hbas...@gmail.com wrote:
Hi,
Thanks for your reply. So does HBase automatically reassign to another
regionserver or do we have to do it manually.
It does it automatically.
St.Ack
On Wed, Jun 8, 2011 at 9:45 PM, James Ram hbas...@gmail.com wrote:
Is there anyway to add a new HQuorum to the cluster dynamically?
If HQuorum == HRegionServer, then yes. Just make sure it has same
config. as other members of the cluster and start it.
St.Ack
On Wed, Jun 8, 2011 at 9:37 PM, praveenesh kumar praveen...@gmail.com wrote:
But my problem is I want to keep the entry of localhost in my /etc/hosts
file..
Is there any parameter that we can put in hbase-site.xml so that RPC starts
listening on regionserver's actual IP rather than default
53 matches
Mail list logo