Re: Additional disk space required for Hbase compactions..

2010-05-17 Thread Edward Capriolo
On Mon, May 17, 2010 at 3:03 PM, Jonathan Gray jg...@facebook.com wrote: I'm not sure I understand why you distinguish small HFiles and a single behemoth HFile? Are you trying to understand more about disk space or I/O patterns? It looks like your understanding is correct. At the worst

Re: Using HBase on other file systems

2010-05-13 Thread Edward Capriolo
On Thu, May 13, 2010 at 12:26 AM, Jeff Hammerbacher ham...@cloudera.comwrote: Some projects sacrifice stability and manageability for performance (see, e.g., http://gluster.org/pipermail/gluster-users/2009-October/003193.html ). On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo edlinuxg

Re: Using HBase on other file systems

2010-05-12 Thread Edward Capriolo
On Tuesday, May 11, 2010, Jeff Hammerbacher ham...@cloudera.com wrote: Hey Edward, I do think that if you compare GoogleFS to HDFS, GFS looks more full featured. What features are you missing? Multi-writer append was explicitly called out by Sean Quinlan as a bad idea, and rolled back.

Re: Using HBase on other file systems

2010-05-12 Thread Edward Capriolo
there is no way for Gluster to export stripe locations back to Hadoop. It seems a poor choice. - Andy From: Edward Capriolo Subject: Re: Using HBase on other file systems To: hbase-user@hadoop.apache.org hbase-user@hadoop.apache.org Date: Wednesday, May 12, 2010, 6:38 AM On Tuesday, May

Re: Using HBase on other file systems

2010-05-11 Thread Edward Capriolo
On Tue, May 11, 2010 at 3:51 PM, Jeff Hammerbacher ham...@cloudera.comwrote: Hey, Thanks for the evaluation, Andrew. Ceph certainly is elegant in design; HDFS, similar to GFS [1], was purpose-built to get into production quickly, so its current incarnation lacks some of the same elegance. On

Re: HBase Design Considerations

2010-05-03 Thread Edward Capriolo
On Mon, May 3, 2010 at 4:04 AM, Steven Noels stev...@outerthought.orgwrote: On Mon, May 3, 2010 at 8:42 AM, Saajan ssangra...@veriskhealth.com wrote: Would highly appreciate comments on how HBase is used to support search applications and how we can support search / filter across multiple

Re: Theoretical question...

2010-04-29 Thread Edward Capriolo
On Thu, Apr 29, 2010 at 4:31 PM, Michael Segel michael_se...@hotmail.comwrote: Imagine you have a cloud of 100 hadoop nodes. In theory you could create multiple instances of HBase on the cloud. Obviously I don't think you could have multiple region servers running on the same node. The use

Re: Very long time between node failure and reasing of regions.

2010-04-26 Thread Edward Capriolo
2010/4/26 Michał Podsiadłowski podsiadlow...@gmail.com Hi hbase users, during our tests on production environment we found few really big problems that stops us from using hbase. First major problem is availability: we have now 6 regions servers + 2 masters + 3 zk. When we shutdown normally

Re: Temporal database - infinite timestamp numbers

2010-04-15 Thread Edward Capriolo
On Thu, Apr 15, 2010 at 4:36 PM, Ryan Rawson ryano...@gmail.com wrote: From an implementation point of view, extremely large rows can become a problem. Since region splits are on the row, if a a single row becomes larger than a region we become unable to split that to spread the load out.

Re: org.apache.hadoop.hbase.mapreduce.Export fails with an NPE

2010-04-10 Thread Edward Capriolo
On Sat, Apr 10, 2010 at 1:31 PM, George Stathis gstat...@gmail.com wrote: Ted, HADOOP-6695 is an improvement request and a different issue from what I am encountering. What I am referring to is not a dynamic classloading issue. It happens even after the servers are being restarted. You are

Re: Is NotServingRegionException really an Exception?

2010-03-31 Thread Edward Capriolo
On Wed, Mar 31, 2010 at 10:51 AM, Al Lias al.l...@gmx.de wrote: Am 31.03.2010 16:47, schrieb Gary Helmling: NotServingRegionException is a normal part of operations when regions transition (ie due to splits). It's how the region server signals back to the client that it needs to

Re: Is NotServingRegionException really an Exception?

2010-03-31 Thread Edward Capriolo
On Wed, Mar 31, 2010 at 11:02 AM, Gary Helmling ghelml...@gmail.com wrote: Well I would still view it as an exceptional condition. The client asked for data back from a server that does not own that data. Sending back an exception seems like the appropriate response, to me at least. It's

Re: Deployment question

2010-03-23 Thread Edward Capriolo
On Tue, Mar 23, 2010 at 5:15 PM, Ryan Rawson ryano...@gmail.com wrote: The instructions for setting up HBase to work with Mapreduce are here: http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html -ryan On Tue, Mar 23, 2010 at 2:12 PM,

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

2010-03-13 Thread Edward Capriolo
On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee prasen@gmail.com wrote: I agree that running 24/7 hbase servers on ec2 is not advisable. But I need some suggestions for running mapred-jobs ( in batches ) followed by updating the results on an existing hbase server. Is it advisable to

Re: Why windows support is critical

2010-03-01 Thread Edward Capriolo
On Mon, Mar 1, 2010 at 11:47 AM, nocones77-gro...@yahoo.com wrote: This is my first post to the group, so I'm not sure I have a lot to add to the conversation yet. But I've been lurking/searching for a week now, and wanted to add a me too to Ravi's comments. The quick-start would be

HBase Rest formatting

2010-02-24 Thread Edward Capriolo
Hey guys, I am running... hbase-0.20-0.20.0~1-1.cloudera.noarch.rpm I know I should upgrade. Is this a known bug? from (hbase):2hbase(main):002:0 put 'webdata' , test , anchor:stuff, a hbase(main):006:0 get 'webdata' , test COLUMN CELL anchor:stuff

Re: WrongRegionException

2010-02-17 Thread Edward Capriolo
On Wed, Feb 17, 2010 at 1:52 PM, Ted Yu yuzhih...@gmail.com wrote: In ASCII, 5 is ahead of a So the rowkey is outside the region. On Wed, Feb 17, 2010 at 8:33 AM, Zhenyu Zhong zhongresea...@gmail.comwrote: Hi, I came across this problem recently. I tried to query a table with rowkey

Re: Optimizations for random read performance

2010-02-17 Thread Edward Capriolo
On Wed, Feb 17, 2010 at 6:17 PM, Ryan Rawson ryano...@gmail.com wrote: Why is LZO's license a problem?  Sure it's GPL, but so is a lot of the software on a linux system... The way the GPL is done, there is no compile time dependency between HBase and the LZO libraries.  Thus there is no GPL

Re: Data processing/filtering on the server

2010-01-14 Thread Edward Capriolo
server exception. Based on your response if the filtering is applied on the server side obviously my local custom filter class cannot be used.  Am I guessing it right ?? -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Wednesday, January 13, 2010

Re: Data processing/filtering on the server

2010-01-14 Thread Edward Capriolo
://www.zeroturnaround.com/jrebel/) or alternately their new LiveRebel production version (http://www.zeroturnaround.com/liverebel/) for this. It does hot loading of updated classes. I have not used it with HBase myself though, so no promises. --gh On Thu, Jan 14, 2010 at 11:14 AM, Edward Capriolo

Re: Updated HBASE RPMS

2009-12-24 Thread Edward Capriolo
accessible 6) grant read directory access to the above including scripts etc In this way, when Andrew releases a new RPM he gives me a nudge and I update the script for step 1? Edward On Tue, Dec 22, 2009 at 3:45 PM, Andrew Purtell apurt...@apache.org wrote: From: Edward Capriolo

Re: Updated HBASE RPMS

2009-12-24 Thread Edward Capriolo
://www.jointhegrid.com/hbase-repo-home/redhat/hbase-jtg.repo in your yum.repos.d you should be able to use the repo. Edward On Thu, Dec 24, 2009 at 12:42 PM, Edward Capriolo edlinuxg...@gmail.com wrote: How about we handle it like this, (on my system) 1) write a script to download your source RPM 2

Updated HBASE RPMS

2009-12-22 Thread Edward Capriolo
All, I got my hbase jumpstart with the cloudera RPMs. Cloudera told me the HBase guys created them (im assuming those guys are on list). I have not been able to find the RPMs anywhere besides cloudera. Cloudera did provide me the source RPMs, I have noticed however that CE was still at v 0.20.0,

Re: Updated HBASE RPMS

2009-12-22 Thread Edward Capriolo
So should I stop rolling these? I do not think so, cloudera is managing their own patch level and they are more likely to chose stability over latest and greatest. I liked the layout and the init scripts, but I needed the more bleeding edge features. I did some searching but was not able to

Re: Updated HBASE RPMS

2009-12-22 Thread Edward Capriolo
Yes, Those source rpms would work for me, but is there a repository that I can configure in my yum.repos.d? That is what I was looking to set up.

Re: LZO Link problem

2009-12-21 Thread Edward Capriolo
The entire /usr/lib rather than /usr/lib64 is mostly to blame on distributions, RedHat took to the idea of putting 64 bit libraries in /usr/lib64. I think the rational was a 32bit RPM and 64 bit RPM could be installed to the same machine. Others argue that /usr/lib should be the libs for YOUR os.

Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Edward Capriolo
On Tue, Dec 15, 2009 at 1:03 AM, stack st...@duboce.net wrote: HBase requires java 6 (1.6) or above. St.Ack On Mon, Dec 14, 2009 at 7:41 PM, Paul Smith psm...@aconex.com wrote: Just wondering if anyone knows of an existing Hbase utility library that is open sourced that can assist those

Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Edward Capriolo
are doubtless more mature, so they may meet your needs as well.  If none of them are quite what you're looking for, then there's always room for another! --gh On Tue, Dec 15, 2009 at 10:39 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Tue, Dec 15, 2009 at 1:03 AM, stack st...@duboce.net wrote

Re: HBase graphs for cacti

2009-12-14 Thread Edward Capriolo
2009/12/14 Michał Podsiadłowski podsiadlow...@gmail.com: Hi Edward, do you have maybe something ready for integration with Nagios? You mentioned it in your presentation. 2009/12/13 Edward Capriolo edlinuxg...@gmail.com All, I have created cacti graphs for all the JMX exported variables

Re: HBase graphs for cacti

2009-12-14 Thread Edward Capriolo
://hadoop.apache.org/hbase/docs/r0.20.2/metrics.html)? Thanks, St.Ack On Sat, Dec 12, 2009 at 7:10 PM, Edward Capriolo edlinuxg...@gmail.com wrote: All, I have created cacti graphs for all the JMX exported variables with HBase. HBase 20.2 - RegionServer - Atomic Incr HBase 20.2

Re: HBase graphs for cacti

2009-12-14 Thread Edward Capriolo
On Mon, Dec 14, 2009 at 2:34 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Mon, Dec 14, 2009 at 2:18 PM, Lars George lars.geo...@gmail.com wrote: Stack +1 Thought about the same. On Mon, Dec 14, 2009 at 7:24 PM, stack st...@duboce.net wrote: Edward: That looks sweet.  Should we add

HBase graphs for cacti

2009-12-12 Thread Edward Capriolo
All, I have created cacti graphs for all the JMX exported variables with HBase. HBase 20.2 - RegionServer - Atomic Incr HBase 20.2 - RegionServer - BlockCache HBase 20.2 - RegionServer - BlockCache Count HBase 20.2 - RegionServer - BlockCache HitRatio HBase 20.2 -

Re: Is TaskTracker required for hbase?

2009-12-11 Thread Edward Capriolo
If you do not have much time to experiment, you prob do not have time for this, because you are going to have to build your own code to run all the components in one jvm. If you are unfamiliar with the codebases this takes getting used to, also running the ant test targets take some time. so I

Re: PrefixFilter performance question.

2009-12-10 Thread Edward Capriolo
On Tue, Dec 8, 2009 at 11:43 PM, stack st...@duboce.net wrote: Try using this filter instead:      scan.setFilter(FirstKeyOnlyFilter.new()) Will only return row keys, if thats the effect you are looking for. St.Ack On Tue, Dec 8, 2009 at 3:30 PM, Edward Capriolo edlinuxg

Re: Combining hadoop dataNode process with hbase into single JVM process

2009-12-10 Thread Edward Capriolo
2009/12/10 Michał Podsiadłowski podsiadlow...@gmail.com: Hi all! Sorry for duplicating message from hadoop list but I think not all of you are reading that one and I really need to know your opinion. I have been recently experimenting with hadoop and hbase for my company. And after some

Re: How can I just scan the row key ?

2009-12-09 Thread Edward Capriolo
On Wed, Dec 9, 2009 at 9:45 AM, Jeff Zhang zjf...@gmail.com wrote: Peter, Your method will scan one column family, but I just want to only scan the row keys. Is it possible ? Jeff Zhang On Wed, Dec 9, 2009 at 6:42 AM, Peter Rietzler peter.rietz...@smarter-ecommerce.com wrote: Sorry

JMX Metrics with HBASE

2009-12-08 Thread Edward Capriolo
List, I am interested in pulling the HBASE metrics with JMX. I notice one quick thing. Like hadoop-env hbase.env should have environment variables that allow setting options per daemon. Also I have noticed that even though I start a regionserver with JMX enabled, the information that appears in

Re: JMX Metrics with HBASE

2009-12-08 Thread Edward Capriolo
see the JMX setup instructions: http://hadoop.apache.org/hbase/docs/r0.20.2/metrics.html And if you've already read through that and are running into difficulties, let us know what kinds of problems/errors you're seeing. Thanks, Gary On Tue, Dec 8, 2009 at 9:52 AM, Edward Capriolo

PrefixFilter performance question.

2009-12-08 Thread Edward Capriolo
Hey all, I have been doing some performance evaluation with mysql vs hbase. I have a table webtable {NAME = 'webdata', FAMILIES = [{NAME = 'anchor', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'image',

Re: PrefixFilter performance question.

2009-12-08 Thread Edward Capriolo
, a scanner can time out, which causes unhappy jobs/people/emails. BTW I can read small rows out of a 19 node cluster at 7 million rows/sec using a map-reduce program.  Any individual process is doing 40k+ rows/sec or so -ryan On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo edlinuxg...@gmail.com

Scanner API Question

2009-12-07 Thread Edward Capriolo
I have spend some time writing an app to load random data into HBASE and record the performance from proof of concept type work. My table definition: //create 'webdata', {NAME = 'image'},{NAME = 'anchor'},{NAME = 'raw_data'} hbase(main):003:0 scan 'webdata' ,{ LIMIT = 1 } ROW

Re: Scanner API Question

2009-12-07 Thread Edward Capriolo
On Mon, Dec 7, 2009 at 6:18 PM, Erik Holstad erikhols...@gmail.com wrote: Hey Edward! s.addColumn( Bytes.toBytes(anchor), Bytes.toBytes(anchor)  ); this looks for anchor:anchor, which I don't see s.addColumn( Bytes.toBytes(anchor), Bytes.toBytes(anchor:Alverta Angstrom cathodegraph)  );

Re: Expired Scanner Lease == RegionServer death ?

2009-12-05 Thread Edward Capriolo
I am looking to add the hbase JMX to my cacti JMX package. http://www.jointhegrid.com/hadoop/ Ed On Sat, Dec 5, 2009 at 12:34 PM, Andrew Purtell apurt...@apache.org wrote: This sounds interesting. Is there a JK JIRA up about it? Assign it to me?   - Andy