Re: using HBase to read from xml files in HDFS

2011-06-06 Thread James Ram
Hi, Forgive me, but I am not much experienced in HBase. Would you please elaborate on how to do this? Thanks, JR On Fri, Jun 3, 2011 at 5:31 PM, Michel Segel michael_se...@hotmail.comwrote: James, yes you can do it. You need to write your own input format to split each input on a specified

Hadoop is not working after adding hadoop-core-0.20-append-r1056497.jar

2011-06-06 Thread praveenesh kumar
Hello guys..!!! I am currently working on Hbase 0.90.3 and Hadoop 0.20.2 Since this hadoop version does not support rsync hdfs.. so I copied the *hadoop-core-append jar* file from *hbase/lib* folder into*hadoop folder * and replaced it with* hadoop-0.20.2-core.jar* which was suggested in the

Re: Does Hadoop 0.20.2 and HBase 0.90.3 compatible ??

2011-06-06 Thread praveenesh kumar
Hello guys..!!! I copied the hadoop-core-append jar file from hbase/lib folder into hadoop folder and replaced it with hadoop-0.20.2-core.jar which was suggested in the following link http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm I guess this is what

Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Hi, Not able to see my email in the mail archive..So sending it again...!!! Guys.. need your feedback..!! Thanks, Praveenesh -- Forwarded message -- From: praveenesh kumar praveen...@gmail.com Date: Mon, Jun 6, 2011 at 12:09 PM Subject: Hadoop is not working after adding

full table scan

2011-06-06 Thread Andreas Reiter
hello everybody i'm trying to scan my hbase table for reporting purposes the cluster has 4 servers: - server1: namenode, secondary namenode, jobtracker, hbase master, zookeeper1 - server2: datanode, tasktracker, hbase regionserver, zookeeper2 - server3: datanode, tasktracker, hbase

Re: full table scan

2011-06-06 Thread Joey Echeverria
How many regions does your table have? On Mon, Jun 6, 2011 at 4:48 AM, Andreas Reiter a.rei...@web.de wrote: hello everybody i'm trying to scan my hbase table for reporting purposes the cluster has 4 servers:  - server1: namenode, secondary namenode, jobtracker, hbase master, zookeeper1  -

HBase Web UI showing exception everytime I am running it

2011-06-06 Thread praveenesh kumar
Hello guys.. I am not able to run my hbase 0.90.3 cluster on top of hadop 0.20.2 cluster I dnt know why its happening..onlye 1 time its running .. after that its not.. HBASE WEB URL is showing the following exception Why its happening... Please help..!! Thanks, Praveenesh HTTP ERROR

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Hello guys.. Changing the name of the hadoop-apppend-core.jar file to hadoop-0.20.2-core.jar did the trick.. Its working now.. But is this the right solution to this problem ?? Thanks, Praveenesh On Mon, Jun 6, 2011 at 2:18 PM, praveenesh kumar praveen...@gmail.comwrote: Hi, Not able to

Re: How to split a specified number of rows per Map

2011-06-06 Thread edward choi
I guess there are no other options. Thanks for the info Ted. Ed. 2011/6/5 Ted Yu yuzhih...@gmail.com You need to modify getSplits(). On Sun, Jun 5, 2011 at 4:04 AM, edward choi mp2...@gmail.com wrote: Hi, I am using HBase as a source of my MapReduce jobs. I recently found out

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Mike Spreitzer
My latest information (not from me, from actual experts) says it is NOT the right approach. Look further into that discussion thread. I do not understand why (http://hbase.apache.org/notsoquick.html#hadoop) still points at that misleading message. Regards, Mike Spreitzer From:

Re: full table scan

2011-06-06 Thread Christopher Tarnas
How many regions does your table have? If all of the data is still in one region then you will be rate limited by how fast that single region can be read. 3 nodes is also pretty small, the more nodes you have the better (at least 5 for dev and test and 10+ for production has been my experience).

Question regarding hbase.hregion.max.filesize and dfs.block.size

2011-06-06 Thread Mat Hofschen
Hello, the hbase book (http://hbase.apache.org/book.html) suggests to increase hbase.hregion.max.filesize to a large value. ( 1G) Then there are many suggestions on mailing list to keep the dfs.block.size set at 64M. What is the relationship between the two values? How does hbase prevent lots of

Reading a Hdfs file using HBase

2011-06-06 Thread Karthik Kumar
Hi, Is it possible to read a file inside HDFS using HBase. If yes please help me with way. What are all the class required to do it? Do i need to use MapReduce? -- With Regards, Karthik

Re: Reading a Hdfs file using HBase

2011-06-06 Thread Joey Echeverria
I'm sorry, but I don't understand your question. Why do you need or want an extra layer of indirection (HBase) in order to read your files from HDFS? -Joey On Mon, Jun 6, 2011 at 6:04 AM, Karthik Kumar karthik84ku...@gmail.com wrote: Hi, Is it possible to read a file inside HDFS using HBase.

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Stack
Praveenesh: Please stop mailing hadoop common-user AND hbase user lists. Mail one or the other. On Mon, Jun 6, 2011 at 1:48 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, Not able to see my email in the mail archive..So sending it again...!!! What brings on the exclamation marks?

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Stack.. Sorry for the confusion.. I was not able to see my email in the archive.. thought like my email didn't reached properly.. so thats why I send it again..anyways will take care in future..!! Regards, Praveenesh On Mon, Jun 6, 2011 at 8:58 PM, Stack st...@duboce.net wrote: Praveenesh:

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 7:18 AM, Mike Spreitzer mspre...@us.ibm.com wrote: My latest information (not from me, from actual experts) says it is NOT the right approach.  Look further into that discussion thread.  I do not understand why (http://hbase.apache.org/notsoquick.html#hadoop) still

RE: What's the best approach to search in HBase?

2011-06-06 Thread Buttler, David
I store over 500M documents in HBase, and index using Solr with dynamic fields. This gives you tremendous flexibility to do the type of queries you are looking for -- and to make them simple and intuitive via a faceted interface. However, there was quite a bit of software that we had to write

RE: connection loss for /hbase

2011-06-06 Thread Buttler, David
To expand on Cosmin's answer... I saw elsewhere that Stack suggested upping the number of zookeeper connections to 1000. This can be set up in you hbase-site.xml file with the parameter hbase.zookeeper.property.maxClientCnxns, assuming you are using hbase to manage zookeeper. Remember that

Re: HBase Web UI showing exception everytime I am running it

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 6:21 AM, praveenesh kumar praveen...@gmail.com wrote: I am not able to run my hbase 0.90.3 cluster on top of hadop 0.20.2 cluster You saw my previous mail warning that the above combo is not a good one? HBASE WEB URL is showing the following exception Problem

Re: mslab enabled jvm crash

2011-06-06 Thread Wayne
I had 25 sec CMF failure this morning...looks like bulk inserts are required along with possibly weekly/daily scheduled rolling restarts. Do most production clusters run rolling restarts on a regular basis to give the JVM a fresh start? Thanks. On Thu, Jun 2, 2011 at 1:56 PM, Wayne

Re: Question regarding hbase.hregion.max.filesize and dfs.block.size

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 7:57 AM, Mat Hofschen hofsc...@gmail.com wrote: Hello, the hbase book (http://hbase.apache.org/book.html) suggests to increase hbase.hregion.max.filesize to a large value. ( 1G) Yes. If you want to cut down on the number of regions in a table, this is a good

Re: mslab enabled jvm crash

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 10:06 AM, Wayne wav...@gmail.com wrote: I had 25 sec CMF failure this morning...looks like bulk inserts are required along with possibly weekly/daily scheduled rolling restarts. Do most production clusters run rolling restarts on a regular basis to give the JVM a fresh

Re: follow up question on row key schema design

2011-06-06 Thread Arvind Jayaprakash
On Jun 02, Sam Seigal wrote: eventid - -mm-dd My eventId can be one of 12 distinct values (let us say from A-L) , and I have a 4 node cluster running HBase right now. After doing some research in our OLTP database, I found that the majority (about 45% of the data) from the last 6 months

Re: Question from HBase book: HBase currently does not do well with anything about two or three column families

2011-06-06 Thread Jeff Whiting
If they have divergent read and write patterns why not put them in separate tables? ~Jeff On 6/2/2011 4:06 PM, Doug Meil wrote: Re: Is that still considered current? Do folks on the list generally agree with that guideline? Yes and yes. HBase runs better with fewer CFs. -Original

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Joe Pallas
On Jun 6, 2011, at 8:45 AM, Stack wrote: On Mon, Jun 6, 2011 at 7:18 AM, Mike Spreitzer mspre...@us.ibm.com wrote: My latest information (not from me, from actual experts) says it is NOT the right approach. Look further into that discussion thread. I do not understand why

Node Monitoring

2011-06-06 Thread Wayne
Are there any recommended methods/scripts to monitor nodes via nagios? It would be best to have a simple nagios call to check hadoop, hbase, thrift separately and alarm if one of them is awol (and not have the script cause damage like I have read with thrift). For example our friendly CMF issues

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 11:24 AM, Joe Pallas joseph.pal...@oracle.com wrote: Hi St.Ack.  Here is the sense in which the book leads a new user to the route that Mike (and I) took.  It seems to say this: paraphrase You have a choice.  You can download the source for the append branch of

Re: full table scan

2011-06-06 Thread Himanshu Vashishtha
Also, How big is each row? Are you using scanner cache? You just fetching all the rows to the client, and?. 300k is not big (It seems you have 1'ish region, that could explain similar timing). Add more data and mapreduce will pick up! Thanks, Himanshu On Mon, Jun 6, 2011 at 8:59 AM, Christopher

Connection Refused on RS startup to Master

2011-06-06 Thread Young
Hi sirs, I've been playing around with the DNS configuration on one of my region servers (trying to get all communication to go over another private interface). Unfortunately I was doing this while the master was still using the server. Now it seems like I cannot telnet from this slave

RE: feature request (count)

2011-06-06 Thread Buttler, David
I will second the idea of having just a count of key-value entries. However, I am not sure about Matt's idea of knowing the number of rows/ KV entries based on numPuts / numDeletes. If I have maxVersions=1, and I put 1000 KV entries with the same key, wouldn't that change my count by a

RE: How to efficiently join HBase tables?

2011-06-06 Thread Buttler, David
So, you all realize the joins have been talked about in the database community for 40 years? There are two main types of joins: Nested loops Hash table Mike, in his various emails seems to be trying to re-imagine how to implement both types of joins in HBase (which seems like a reasonable

RE: How to efficiently join HBase tables?

2011-06-06 Thread Doug Meil
Re: So, you all realize the joins have been talked about in the database community for 40 years? Great point. What's old is new!:-) My suggested from earlier in the thread was a variant of nested loops by using multi-get in HTable, which would reduce the number of RPC calls. So it's a

Re: full table scan

2011-06-06 Thread Andre Reiter
good question... i have no idea... i did not define explicitly the number of regions for the table, how can i find out how many regions does my table have? how many ragions should the table have? how to change the number of the regions? best regards andre - Original Message -

RE: full table scan

2011-06-06 Thread Doug Meil
Check the web console. -Original Message- From: Andre Reiter [mailto:a.rei...@web.de] Sent: Monday, June 06, 2011 5:27 PM To: user@hbase.apache.org Subject: Re: full table scan good question... i have no idea... i did not define explicitly the number of regions for the table, how can

Re: full table scan

2011-06-06 Thread Andre Reiter
Check the web console. ah, ok thanks! at the port 60010 on the hbase master i actually found a web interface there was only one region, i played i bit with it, and executed the Split function twice. Now i have three regions, one on each hbase region server but still, the processing time did

Re: full table scan

2011-06-06 Thread Ted Yu
I think row counter would help you figure out the number of rows in each region. Refer to the following email thread, especially Stack's answer on Apr 1: row_counter map reduce job 0.90.1 On Mon, Jun 6, 2011 at 3:07 PM, Andre Reiter a.rei...@web.de wrote: Check the web console. ah, ok

exporting from hbase as text (tsv)

2011-06-06 Thread Jack Levin
Hello, does anyone have any tools you could share that would take a table, and dump the contents as TSV text format? We want it in tsv for quick HIVE processing that we have in the another datamining cluster. We do not want to write custom map-reduce jobs for hbase because we already have an

Re: exporting from hbase as text (tsv)

2011-06-06 Thread Bill Graham
You can do this in a few lines of Pig, check out the HBaseStorage class. You'll need to now the names of your column families, but besides that it could be done fairly generically. On Mon, Jun 6, 2011 at 3:57 PM, Jack Levin magn...@gmail.com wrote: Hello, does anyone have any tools you could

Re: exporting from hbase as text (tsv)

2011-06-06 Thread Jack Levin
there is export tool that exports tables into sequence files, question is, what do I do with those seq. files to convert them to text? -Jack On Mon, Jun 6, 2011 at 4:56 PM, Bill Graham billgra...@gmail.com wrote: You can do this in a few lines of Pig, check out the HBaseStorage class. You'll

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Mike Spreitzer
Where is that citation of Michael Noll's nicely detailed instruction on how to build the append branch? Why does hbase include a hadoop-core.jar? The instructions say I should replace it, so why am I given it in the first place? Thanks, Mike Spreitzer From: Stack st...@duboce.net To:

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread stack
On Mon, Jun 6, 2011 at 6:49 PM, Mike Spreitzer mspre...@us.ibm.com wrote: Where is that citation of Michael Noll's nicely detailed instruction on how to build the append branch? See Section 1.3.1.2 here http://hbase.apache.org/book/notsoquick.html#requirements. Look for Michael Noll has

hbase region server/region information

2011-06-06 Thread Saurabh Sehgal
Hi, I just loaded a bunch of data into Hbase , and want to know the number of regions created, the region servers they are assigned to, and the region size being managed by each region server. Are there any tools/utilities I can use to quickly obtain this data without setting up hadoop metrics ?

Re: exporting from hbase as text (tsv)

2011-06-06 Thread Stack
You could hook up http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.html to a map that emit tsv lines (use the tsv escaping lib du jour to make sure tabs are properly escaped or just search and replace tabs in source yourself if not

Re: hbase region server/region information

2011-06-06 Thread Stack
HBase runs a UI by default on port 60010 on master. It shows by regions. Or fire up the shell and get a detailed status listing. St.Ack On Mon, Jun 6, 2011 at 8:11 PM, Saurabh Sehgal saurabh@gmail.com wrote: Hi, I just loaded a bunch of data into Hbase , and want to know the number of

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Joe Pallas
On Jun 6, 2011, at 12:36 PM, Stack wrote: In particular, the ...which will take who knows how long and require additional tools and may not work on your preferred development platform bit. Michael Noll has written up a nicely detailed instruction on how to build the append branch. Its

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread praveenesh kumar
Guys.. just adding to the conversation..I also think that about this hadoop-append version thing .. I think it should be mentioned in the hadoop website also.. http://hadoop.apache.org/common/releases.html There is no mention about hadoop-append release and anything about HDFS that has a durable

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 4:36 PM, Joe Pallas joseph.pal...@oracle.com wrote: Good to know.  I had somehow gotten the idea that there was a compatibility issue with 0.22 that might not get resolved, but I must have been confused.   Or maybe there was a question about whether Hadoop 0.22 would

Re: exporting from hbase as text (tsv)

2011-06-06 Thread Jack Levin
Can you hook hive to hbase? Yes, we used hbase to hive and back before, but its not real flexible, especially going hbase - hive route. Much better prefer bulk uploader tool for modified tables via hive map-reduce of tsv or csv. -Jack

Re: Reading a Hdfs file using HBase

2011-06-06 Thread James Ram
Hi, I too have the same situation. The data in HDFS should be mapped to columns in hbase. We will be putting data in bulk to HDFS and we want HBase to read from HDFS and put the values in the respective columns. Is that possible. How can I map the data from HDFS to HBase columns. Is there any

Re: Reading a Hdfs file using HBase

2011-06-06 Thread Bill Graham
You can load the HDFS files into HBase. Check out importtsv to generate HFiles and completebulkload to load them into a table: http://hbase.apache.org/bulk-loads.html On Mon, Jun 6, 2011 at 9:38 PM, James Ram hbas...@gmail.com wrote: Hi, I too have the same situation. The data in HDFS

Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Mike Spreitzer
I have been looking at http://hbase.apache.org/notsoquick.html#hadoop which does NOT have that citation. So I never saw that before now. It is indeed helpful. But: must we really spend hours on flaky tests while building? Also, it would comfort noobs like me if there were a bit of