A possible bug in the scanner.

2011-04-13 Thread Vidhyashankar Venkataraman
(This could be a known issue. Please let me know if it is). We had a set of uncompacted store files in a region. One of the column families had a store file of 5 Gigs. The other column families were pretty small (a few megabytes at most). It so turned out that all these files had rows whose

Re: A possible bug in the scanner.

2011-04-13 Thread Ted Yu
Have you read the following thread ? ScannerTimeoutException when a scan enables caching, no exception when it doesn'tDid you enable caching ? If not, it is different issue. On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman vidhy...@yahoo-inc.com wrote: (This could be a known issue.

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Venkatesh
Thanks J-D I made sure to pass conf objects around in places where I was n't.. will give it a try -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Apr 12, 2011 6:40 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception

RE: HBase is not ready for Primetime

2011-04-13 Thread Doug Meil
Hi there- For what it's worth, although we haven't had this particular issue we've certainly had other bumps and bruises (GC of death, and other metadata issues caused when a split dies during a GC of death, etc.). But there are few general items that helped in stability and performance I

One MapReduce and two HBase 0.20.6 clusters

2011-04-13 Thread Manuel de Ferran
Greetings, I'm trying to backport CopyTable to HBase 0.20.6. In other words, the challenge is to write a job that would copy data from one HTable on cluster A to another HTable on cluster B. I'm able to copy HTable to another HTable on the same cluster, but I can not find a way to point to the

Re: A possible bug in the scanner.

2011-04-13 Thread Vidhyashankar Venkataraman
Hi We had enabled scanner caching but I don't think it is the same issue because scanner.next in this case is blocking: the scanner is busy in the region server but hasn't returned anything yet since a row to be returned hasn't been found yet (all rows have expired but are still there since

AW: data locality for reducer writes?

2011-04-13 Thread Biedermann,S.,Fa. Post Direkt
Hi Jean-Daniel, thx for your reply. What I assume is that the total network load during reduce is O(n) with n the number of nodes in the cluster. We saw a major performance loss in the reduce step when our network degraded to 100Mbit by accident (1h vs. 13 minutes). With more nodes I see 2

Re: A possible bug in the scanner.

2011-04-13 Thread Gary Helmling
Hi Vidhya, So it sounds like the timeout thread is timing out the scanner when it takes more than 60 seconds reading through the large column family store file without returning anything to the client? Even without the TTL expiration being applied, I think I've heard of this in other cases where

Re: A possible bug in the scanner.

2011-04-13 Thread Jean-Daniel Cryans
This could be HBASE-2077 J-D On Wed, Apr 13, 2011 at 9:15 AM, Gary Helmling ghelml...@gmail.com wrote: Hi Vidhya, So it sounds like the timeout thread is timing out the scanner when it takes more than 60 seconds reading through the large column family store file without returning anything

Re: A possible bug in the scanner.

2011-04-13 Thread Gary Helmling
Looks like the most recent patch for HBASE-2077 does try to address this with the usage counter. That may be the more correct approach, but I was wondering if we would do something simpler with periodically renewing the lease down in the RegionScanner iteration? Sort of like calling progress()

Re: A possible bug in the scanner.

2011-04-13 Thread Gary Helmling
On Wed, Apr 13, 2011 at 10:03 AM, Vidhyashankar Venkataraman vidhy...@yahoo-inc.com wrote: Even without the TTL expiration being applied, I think I've heard of this in other cases where a very restrictive filter was used on a large table scan. Thanks, I was about to say that in a follow-up

Re: A possible bug in the scanner.

2011-04-13 Thread Jean-Daniel Cryans
Vidhya, the patch in that jira is stale, needs some love. Gary, the AtomicInteger is just there to permit multiple users of a single Lease, not very common so can be changed. The issue with setting some sort of progress is that the Lease is sleeping so you cannot change it's sleeping time. You

Re: A possible bug in the scanner.

2011-04-13 Thread Himanshu Vashishtha
Vidhya, Did you try setting scanner time range. It takes min and max timestamps, and when instantiating the scanner at RS, a time based filtering is done to include only selected store files. Have a look at StoreFile.shouldseek(Scan, Sortedsetbyte[]). I think it should improve the response time.

Re: A possible bug in the scanner.

2011-04-13 Thread Vidhyashankar Venkataraman
Himanshu, Thanks, this will resolve the particular case we ran into. But what if the files are huge and have a wide range of timestamps and only some of the records in the file are valid? And for the other application that we have: scans with filters that returns a sparse set, the solution

RE: HBase is not ready for Primetime

2011-04-13 Thread Andrew Purtell
Hi Doug, 3) Cluster restart We schedule a full shutdown and restart of our cluster each week.  It's pretty quick, and HBase just seems happier when we do this. Can you say a bit more about how HBase is happier versus not? I can speculate on a number of reasons why this may be the case,

Re: rpc call logging

2011-04-13 Thread Andrew Purtell
This sounds like HBASE-2014: https://issues.apache.org/jira/browse/HBASE-2014 BTW apologies for the weird English in that issue, it appears I cut and pasted a request from our China development center without sufficient editing. - Andy

Re: A possible bug in the scanner.

2011-04-13 Thread Himanshu Vashishtha
Vidhya, so yes in the case of huge files with valid rows, timerange thing will not be effective and neither in the case of a scanner hanging in its next calls either by a gc pause or some exhaustive computation. I voted for this answer after reading your initial mail (but it got posted after a

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Ruben Quintero
The problem I'm having is in getting the conf that is used to init the table within TableInputFormat. That's the one that's leaving open ZK connections for me. Following the code through, TableInputFormat initializes its HTable with new Configuration(new JobConf(conf)), where conf is the

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Venkatesh
Reuben: Yes..I've the exact same issue now.. I'm also kicking off from another jvm that runs for ever.. I don't have an alternate solution..either modify hbase code (or) modify my code to kick off as a standalone jvm (or) hopefully 0.90.3 release soon :) J-D/St.Ack may have some suggestions V

Re: One MapReduce and two HBase 0.20.6 clusters

2011-04-13 Thread Jean-Daniel Cryans
HConnectionManager needed some modifications in order to make it work, it's not just about backporting that job. J-D On Wed, Apr 13, 2011 at 7:27 AM, Manuel de Ferran manuel.defer...@gmail.com wrote: Greetings, I'm trying to backport CopyTable to HBase 0.20.6. In other words, the challenge

RE: HBase is not ready for Primetime

2011-04-13 Thread Doug Meil
Context: we're still on .89 - so we can't take advantage of the MemStore allocation buffers yet. One of the most important metrics for us was GC-stuck region servers, and more nodes + more memory + scheduling periodic cluster restarts helped in our situation. I wholeheartedly agree with the

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Jean-Daniel Cryans
Like I said, it's a zookeeper configuration that you can change. If hbase is managing your zookeeper then set hbase.zookeeper.property.maxClientCnxns to something higher than 30 and restart the zk server (can be done while hbase is running). J-D On Wed, Apr 13, 2011 at 12:04 PM, Venkatesh

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Venkatesh
Will do..I'll set it to 2000 as per JIRA.. Do we need a periodic bounce? ..because if this error comes up..only way I get the mapreduce to work is bounce. -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Wed, Apr 13, 2011 3:22

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Jean-Daniel Cryans
Periodic bounce of what? Your client program or the ZK server? J-D On Wed, Apr 13, 2011 at 12:31 PM, Venkatesh vramanatha...@aol.com wrote: Will do..I'll set it to 2000 as per JIRA.. Do we need a periodic bounce? ..because if this error comes up..only way I get the mapreduce to work is

Re: HBase is not ready for Primetime

2011-04-13 Thread Ryan Rawson
To bring it back to the original point and a high level view, the fact is that HBase is not Oracle, nor MySQL. It doesnt have multiple decades, and futhermore distributed systems are inherently more difficult (more failure cases) than single node DBs. Having said that, the grass is certainly not

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Ruben Quintero
The problem is the connections are never closed... so they just keep piling up until it hits the max. My max is at 400 right now, so after 14-15 hours of running, it gets stuck in an endless connection retry. I saw that the HConnectionManager will kick older HConnections out, but the problem

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Jean-Daniel Cryans
Yeah for a JVM running forever it won't work. If you know for a fact that the configuration passed to TIF won't be changed then you can subclass it and override setConf to not clone the conf. J-D On Wed, Apr 13, 2011 at 12:45 PM, Ruben Quintero rfq_...@yahoo.com wrote: The problem is the

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Ruben Quintero
Venkatesh, I guess the two quick and dirty solutions are: - Call deleteAllConnections(bool) at the end of your MapReduce jobs, or periodically. If you have no other tables or pools, etc. open, then no problem. If you do, they'll start throwing IOExceptions, but you can re-instantiate them with

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

2011-04-13 Thread Venkatesh
deleteAllConnections works well for my case..I can live with this but not with connection leaks thanks for the idea Venkatesh -Original Message- From: Ruben Quintero rfq_...@yahoo.com To: user@hbase.apache.org Sent: Wed, Apr 13, 2011 4:25 pm Subject: Re: hbase -0.90.x upgrade

just open sourced Orderly -- a row key schema system (composite keys, etc) for use with HBase

2011-04-13 Thread Michael Dalton
Hi all, I'm with a startup, GotoMetrics, doing things with Hadoop and I've gotten permission to open source Orderly -- our row key schema system for use in projects like HBase. Orderly allows you to serialize common data types (long, double, bigdecimal, etc) or structs/records of these types to

Re: just open sourced Orderly -- a row key schema system (composite keys, etc) for use with HBase

2011-04-13 Thread Andrew Purtell
Michael (and GotoMetrics), Thank you for opening this up! Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Wed, 4/13/11, Michael Dalton mwdal...@gmail.com wrote: Hi all, I'm with a startup, GotoMetrics, doing

java.io.IOException: Filesystem closed

2011-04-13 Thread 陈加俊
2011-04-13 20:27:08,620 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error closing cjjHTML, http://www.csh.gov.cn/article_346937.html,1299079217805 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:234) at

a lots of error about Region has been OPEN for too long

2011-04-13 Thread Gaojinchao
In hbase version 0.90.1 . Is there any experience ? Hmaster Logs : 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything 2011-04-08 16:33:09,384 ERROR

Re: just open sourced Orderly -- a row key schema system (composite keys, etc) for use with HBase

2011-04-13 Thread Ted Dunning
Michael, Interesting contribution to the open source community. Sounds like nice work. Can you say how this relates to Avro with regard to collating of binary data? See, for instance, here: http://avro.apache.org/docs/current/spec.html#order On Wed, Apr 13, 2011 at 5:55 PM, Michael Dalton

Connecting JPA with HBase

2011-04-13 Thread James Ram
How do I persist data from my Spring/Java application to HBase? Currently I am trying to use a datanucleus plugin and connect JPA with HBase. Is this the best way or is there some other method I could use? -- With Regards, Jr.