Hi there-
Take a look at this for starters...
http://hbase.apache.org/book.html#mapreduce
if you do job.waitForCompletion(true); it will execute synchronously. If you
do job.waitForCompletion(false) it will fire and forget. A simple pattern is
to spin off a thread where it executes
Btw. a while back somebody did this for Voldemort, so you may want to have a
look at the code. May not work with the latest and greatest Lucene.
http://groups.google.com/group/project-voldemort/browse_thread/thread/7400d08cb6cb7b83
Otis
We're hiring HBase Lucene/Solr/Elastic Search devs:
Hi,
HBase Lucene have been mentioned on the list this past week, so...
We (Sematext) are looking to hire a person who likes working with
HBase and Lucene|Solr|Elastic Search.
This may sound a bit unusual, but probably the very first target for
this person would be work on marrying HBase and
Ok, HBASE-4030 has been opened against this.
- Adam
On 6/23/11 5:00 PM, Ted Yu wrote:
This is due to the handling of HFile.Reader being wrapped in a
try-finally block. However, there is no check as to whether the reader
operation encounters any exception which should determine what to do next.
Ah well keep in mind that going through the Thrift server requires one
more network roundtrip, and if you used scanner caching in java then
you also need to configure it for thrift. Moreover, you should also do
the scannerGetList call and specify more than 1 row to minimize the
number of RPCs.
Can you instrument AssignmentManager.addToServers and see if that's
really re-adding that node?
Thanks for digging,
J-D
On Thu, Jun 23, 2011 at 7:11 PM, bijieshan bijies...@huawei.com wrote:
The following steps can recreate the problem:
1. There's thousands of regions in the cluster.
2. Stop
I feel like I'm missing too much information to be helpful, for
example when the standby master comes up it needs to 13134 RIT. What
happened there? I thought the regions were all assigned? What happened
to the first master? How come 1306205940117 whent from 5841 regions to
0?
Thx for filling the
2011/6/23 Sateesh Lakkarsu lakka...@gmail.com
We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM,
5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec
per node. Most of the configs are default, except 4G for RS, *handler.count
and gc (
Obviously this sort of test will depend massively on the level of caching.
I believe that the numbers Lohit is quoting were designed to defeat caching
and test the resulting performance.
On Fri, Jun 24, 2011 at 1:41 PM, lohit lohit.vijayar...@gmail.com wrote:
2011/6/23 Sateesh Lakkarsu
Hi Doug,
thanks a lot for your reply
the point is clear hoe to create a job instance and to configure it using the
TableMapReduceUtil.initTableMapperJob
actually our job is working just perfectly, even the third party libs are
simple to import using TableMapReduceUtil.addDependencyJars
the
Take a look at Yahoo's Oozie, it's fairly trivial to build a workflow for a map
reduce job and submit it via the web service for processing, it's a lot easier
than using ProcessBuilder also.
Jon.
On 24 Jun 2011, at 22:47, Andre Reiter a.rei...@web.de wrote:
Hi Doug,
thanks a lot for your
block cache was at default 0.2%, the id's being looked up don't repeat and
each one has a lot of versions, so not expecting cache hits - also was
seeing a lot of cache evictions as is. Can we get better performance in such
a scenario?
Does having more discs help? or would RS be the bottleneck?
If you are defeating caching you will want to patch in HDFS-347.
Good luck!
On Fri, Jun 24, 2011 at 3:25 PM, Sateesh Lakkarsu lakka...@gmail.com wrote:
block cache was at default 0.2%, the id's being looked up don't repeat and
each one has a lot of versions, so not expecting cache hits - also
I'll look into HDFS-347, but in terms of driving more reads thru, does
having more discs help? or would RS be the bottleneck? Any thoughts on this
plz?
Yes.
If you have blown the cache then getting more IOPs per second is good.
On Fri, Jun 24, 2011 at 4:08 PM, Sateesh Lakkarsu lakka...@gmail.comwrote:
I'll look into HDFS-347, but in terms of driving more reads thru, does
having more discs help? or would RS be the bottleneck? Any thoughts on
2011/6/24 Sateesh Lakkarsu lakka...@gmail.com
I'll look into HDFS-347, but in terms of driving more reads thru, does
having more discs help? or would RS be the bottleneck? Any thoughts on this
plz?
Increasing number of disks should increase your read throughput.
We did and experiment with 5
Thanks J-D.
I have filed an issue and attached the logs:
https://issues.apache.org/jira/browse/HBASE-4031
You can check the logs whether they can give you all the missing information.
What happened to the first master?
We killed the active one and let the standby became the active one. For we
18 matches
Mail list logo