RE: Running MapReduce from a web application

2011-06-24 Thread Doug Meil
Hi there- Take a look at this for starters... http://hbase.apache.org/book.html#mapreduce if you do job.waitForCompletion(true); it will execute synchronously. If you do job.waitForCompletion(false) it will fire and forget. A simple pattern is to spin off a thread where it executes

Re: lucene with hbase.

2011-06-24 Thread Otis Gospodnetic
Btw. a while back somebody did this for Voldemort, so you may want to have a look at the code.  May not work with the latest and greatest Lucene. http://groups.google.com/group/project-voldemort/browse_thread/thread/7400d08cb6cb7b83 Otis We're hiring HBase Lucene/Solr/Elastic Search devs:

Adding Lucene/Solr/ES Search to HBase (paid work) + analytics + performance

2011-06-24 Thread Otis Gospodnetic
Hi, HBase Lucene have been mentioned on the list this past week, so... We (Sematext) are looking to hire a person who likes working with HBase and Lucene|Solr|Elastic Search. This may sound a bit unusual, but probably the very first target for this person would be work on marrying HBase and

Re: LoadIncrementalHFiles bug when regionserver fails to access file?

2011-06-24 Thread Adam Phelps
Ok, HBASE-4030 has been opened against this. - Adam On 6/23/11 5:00 PM, Ted Yu wrote: This is due to the handling of HFile.Reader being wrapped in a try-finally block. However, there is no check as to whether the reader operation encounters any exception which should determine what to do next.

Re: Re: PHP access Hbase thrift server very very slow!

2011-06-24 Thread Jean-Daniel Cryans
Ah well keep in mind that going through the Thrift server requires one more network roundtrip, and if you used scanner caching in java then you also need to configure it for thrift. Moreover, you should also do the scannerGetList call and specify more than 1 row to minimize the number of RPCs.

Re: ServerShutdownHandler process finished, but the region was assigned to the shutted RS again

2011-06-24 Thread Jean-Daniel Cryans
Can you instrument AssignmentManager.addToServers and see if that's really re-adding that node? Thanks for digging, J-D On Thu, Jun 23, 2011 at 7:11 PM, bijieshan bijies...@huawei.com wrote: The following steps can recreate the problem: 1. There's thousands of regions in the cluster. 2. Stop

Re: One problem with LoadBalancer

2011-06-24 Thread Jean-Daniel Cryans
I feel like I'm missing too much information to be helpful, for example when the standby master comes up it needs to 13134 RIT. What happened there? I thought the regions were all assigned? What happened to the first master? How come 1306205940117 whent from 5841 regions to 0? Thx for filling the

Re: Random Reads throughput/performance

2011-06-24 Thread lohit
2011/6/23 Sateesh Lakkarsu lakka...@gmail.com We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM, 5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec per node. Most of the configs are default, except 4G for RS, *handler.count and gc (

Re: Random Reads throughput/performance

2011-06-24 Thread Ted Dunning
Obviously this sort of test will depend massively on the level of caching. I believe that the numbers Lohit is quoting were designed to defeat caching and test the resulting performance. On Fri, Jun 24, 2011 at 1:41 PM, lohit lohit.vijayar...@gmail.com wrote: 2011/6/23 Sateesh Lakkarsu

Re: Running MapReduce from a web application

2011-06-24 Thread Andre Reiter
Hi Doug, thanks a lot for your reply the point is clear hoe to create a job instance and to configure it using the TableMapReduceUtil.initTableMapperJob actually our job is working just perfectly, even the third party libs are simple to import using TableMapReduceUtil.addDependencyJars the

Re: Running MapReduce from a web application

2011-06-24 Thread Jonathan Holloway
Take a look at Yahoo's Oozie, it's fairly trivial to build a workflow for a map reduce job and submit it via the web service for processing, it's a lot easier than using ProcessBuilder also. Jon. On 24 Jun 2011, at 22:47, Andre Reiter a.rei...@web.de wrote: Hi Doug, thanks a lot for your

Re: Random Reads throughput/performance

2011-06-24 Thread Sateesh Lakkarsu
block cache was at default 0.2%, the id's being looked up don't repeat and each one has a lot of versions, so not expecting cache hits - also was seeing a lot of cache evictions as is. Can we get better performance in such a scenario? Does having more discs help? or would RS be the bottleneck?

Re: Random Reads throughput/performance

2011-06-24 Thread Ryan Rawson
If you are defeating caching you will want to patch in HDFS-347. Good luck! On Fri, Jun 24, 2011 at 3:25 PM, Sateesh Lakkarsu lakka...@gmail.com wrote: block cache was at default 0.2%, the id's being looked up don't repeat and each one has a lot of versions, so not expecting cache hits - also

Re: Random Reads throughput/performance

2011-06-24 Thread Sateesh Lakkarsu
I'll look into HDFS-347, but in terms of driving more reads thru, does having more discs help? or would RS be the bottleneck? Any thoughts on this plz?

Re: Random Reads throughput/performance

2011-06-24 Thread Ted Dunning
Yes. If you have blown the cache then getting more IOPs per second is good. On Fri, Jun 24, 2011 at 4:08 PM, Sateesh Lakkarsu lakka...@gmail.comwrote: I'll look into HDFS-347, but in terms of driving more reads thru, does having more discs help? or would RS be the bottleneck? Any thoughts on

Re: Random Reads throughput/performance

2011-06-24 Thread lohit
2011/6/24 Sateesh Lakkarsu lakka...@gmail.com I'll look into HDFS-347, but in terms of driving more reads thru, does having more discs help? or would RS be the bottleneck? Any thoughts on this plz? Increasing number of disks should increase your read throughput. We did and experiment with 5

Re: One problem with LoadBalancer

2011-06-24 Thread bijieshan
Thanks J-D. I have filed an issue and attached the logs: https://issues.apache.org/jira/browse/HBASE-4031 You can check the logs whether they can give you all the missing information. What happened to the first master? We killed the active one and let the standby became the active one. For we

Does anybody enable MSLAB in production system? I am not sure if it's stable enough for production system?

2011-06-24 Thread Jack Zhang(jian)