check existence of row through checkAndPut()

2011-06-21 Thread Sam Seigal
Hi, I had a question about how to check for existence of a record in HBase. I went through some threads discussing the various techniques , mainly - row locks and checkAndPut(). My schema looks like the following - prefix-event_type--mm-dd-eventid The reason I am adding the prefix is to

Some Errors in hbck

2011-06-21 Thread Mingjian Deng
Hi : I found some errors when I use hbase hbck like the follow: ERROR: Region hdfs:// 192.168.0.1:9000/hbase/test1/bf933258508257b85b138c9b23887800 on HDFS, but not listed in META or deployed on any region server. Has any one kowns how does it happen and How can I rescue this region?

RE: lucene with hbase.

2011-06-21 Thread Vivek Mishra
This support is already in-built with Kundera(basic support) . -Original Message- From: bharath vissapragada [mailto:bharathvissapragada1...@gmail.com] Sent: Sunday, June 19, 2011 12:42 AM To: user@hbase.apache.org Subject: Re: lucene with hbase. See [1] . This might be of some help to

RE: lucene with hbase.

2011-06-21 Thread Vivek Mishra
Hi, Kundera-examples has to define a dependency on lucene because of it's dependency on KUNDERA.(Somehow Kundera pom should bring it in). You are correct. Kundera is an ORM over Hbase,Cassandra ,Mysql, MongoDB. But it is also JPA compliant. So to execute native sql queries over Hbase or other

Re: hadoop / hbase /zookeeper architecture for best performance

2011-06-21 Thread Andre Reiter
Hi Mingjian, so if i understand it right, the region servers should get as much memory as possible, correct? at the moment our situation: the default amount of 1000 MB is used for the heap size (HBASE_HEAPSIZE) on region servers so if we ran namenode, tasktracker and regionserver on the same

Re: hadoop / hbase /zookeeper architecture for best performance

2011-06-21 Thread Mingjian Deng
Big memory is good for read. Because blockcache use 25% of heapsize default. You can change hfile.block.cache.size to 0.4 or more. I don't use HBASE_HEAPSIZE, I use the follow: export HBASE_MASTER_OPTS=-Xmx8000m export HBASE_ZOOKEEPER_OPTS=-Xmx1000m export

Re: hadoop / hbase /zookeeper architecture for best performance

2011-06-21 Thread Ted Dunning
Why are you running ZK in VM's? If those VM's are on a smaller number of machines, then you are making your failure modes worse, not better. Zookeeper is very well behaved and should normally be run as a bare process. You can migrate/upgrade the cluster at will and you can have several ZK

Re: hadoop / hbase /zookeeper architecture for best performance

2011-06-21 Thread Ted Dunning
Are you over-committing memory? That sounds like you may have some issues with swapping. On Tue, Jun 21, 2011 at 8:43 AM, Andre Reiter a.rei...@web.de wrote: the MR jobs on our Hbase table are running far too slow... RowCounter is running about 13 minutes for 3249727 rows, thats just

Re: Some Errors in hbck

2011-06-21 Thread Stack
Check it out. It may be an old region that was not cleaned up properly. Look at modification times. Look at .regioninfo content to find encoded name and then grep your master logs to see when it was last accessed. St.Ack On Tue, Jun 21, 2011 at 12:46 AM, Mingjian Deng koven2...@gmail.com

Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-21 Thread Leif Wickland
My patch to add support for Increment to TableOutputFormat follows. (I did the svn diff in trunk/src/main/java/org/apache/hadoop/hbase) One point I was unsure about was whether I should duplicate the TimeRange in the Increment's copy constructor. TimeRange is immutable except for its

Re: Keytabs and secure hadoop

2011-06-21 Thread Francis Christopher Liu
Hi Gary, Thanks for the comprehensive reply. It cleared up my doubts about hadoop security and hbase+20s setup. I've setup the principals for each server as you described. Thanks for the warning, we'd like to stick with the ASF releases of hadoop. The project I'm working on is still in it's

Re: Keytabs and secure hadoop

2011-06-21 Thread Andrew Purtell
From: Francis Christopher Liu fc...@yahoo-inc.com Thanks for the warning, we'd like to stick with the ASF releases of hadoop. That's not really advisable with HBase. It's a touchy subject, the 0.20-ish support for append in HDFS exists in production at some large places but isn't in any ASF

TableOutputFormat not efficient than direct HBase API calls?

2011-06-21 Thread edward choi
Hi, I am writing an Hadoop application that uses HBase as both source and sink. There is no reducer job in my application. I am using TableOutputFormat as the OutputFormatClass. I read it on the Internet that it is experimentally faster to directly instantiate HTable and use HTable.batch() in

RE: TableOutputFormat not efficient than direct HBase API calls?

2011-06-21 Thread Doug Meil
TableOutputFormat also does this... table.setAutoFlush(false); Check out the HBase book for how the writebuffer works with the HBase client. http://hbase.apache.org/book.html#client -Original Message- From: edward choi [mailto:mp2...@gmail.com] Sent: Tuesday, June 21, 2011 10:23

Re: TableOutputFormat not efficient than direct HBase API calls?

2011-06-21 Thread Stack
On Tue, Jun 21, 2011 at 7:22 PM, edward choi mp2...@gmail.com wrote: I read it on the Internet that it is experimentally faster to directly instantiate HTable and use HTable.batch() in the Map than to use TableOutputFormat as the Map's OutputClass If you read it on the net, it must be true.

Re: hadoop / hbase /zookeeper architecture for best performance

2011-06-21 Thread Andre Reiter
very strange ... if i run my MR job, the site http://jobtracker:50030/jobtracker.jsp does NOT show any running job... the same for every tasktracker on http://tasktrackerXYZ:50060/tasktracker.jsp , i can not find any entries about a running task... what is that about? :-) may be the whole job

Re: TableOutputFormat not efficient than direct HBase API calls?

2011-06-21 Thread edward choi
Yep, my mistake... I didn't trace the source code deep enough. Looks like HTable always write to writeBuffer for any kind of Put operation. Sorry for the rash inquiry ;-) Ed 2011/6/22 Stack st...@duboce.net On Tue, Jun 21, 2011 at 7:22 PM, edward choi mp2...@gmail.com wrote: I read it on

Re: TableOutputFormat not efficient than direct HBase API calls?

2011-06-21 Thread Stack
On Tue, Jun 21, 2011 at 9:37 PM, edward choi mp2...@gmail.com wrote: Looks like HTable always write to writeBuffer for any kind of Put operation. Sorry for the rash inquiry ;-) No worries. St.Ack