Hi.Very strange, i see in limits.conf that it's upped. I attached the limits.conf, please have a look, may be i did it wrong.
Best Regards. On Thu, Oct 30, 2008 at 7:52 PM, stack <[EMAIL PROTECTED]> wrote: > Thanks for the logs Slava. I notice that you have not upped the ulimit on > your cluster. See the head of your logs where we print out the ulimit. Its > 1024. This could be one cause of your grief especially when you seemingly > have many regions (>1000). Please try upping it. > St.Ack > > > > > Slava Gorelik wrote: > >> Hi. >> I enabled DEBUG log level and now I'm sending all logs (archived) >> including fsck run result. >> Today my program starting to fail couple of minutes from the begin, it's >> very easy to reproduce the problem, cluster became very unstable. >> >> Best Regards. >> >> >> On Tue, Oct 28, 2008 at 11:05 PM, stack <[EMAIL PROTECTED] <mailto: >> [EMAIL PROTECTED]>> wrote: >> >> See http://wiki.apache.org/hadoop/Hbase/FAQ#5 >> >> St.Ack >> >> >> Slava Gorelik wrote: >> >> Hi.First of all i want to say thank you for you assistance !!! >> >> >> DEBUG on hadoop or hbase ? And how can i enable ? >> fsck said that HDFS is healthy. >> >> Best Regards and Thank You >> >> >> On Tue, Oct 28, 2008 at 8:45 PM, stack <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> >> Slava Gorelik wrote: >> >> >> Hi.HDFS capacity is about 800gb (8 datanodes) and the >> current usage is >> about >> 30GB. This is after total re-format of the HDFS that >> was made a hour >> before. >> >> BTW, the logs i sent are from the first exception that >> i found in them. >> Best Regards. >> >> >> >> Please enable DEBUG and retry. Send me all logs. What >> does the fsck on >> HDFS say? There is something seriously wrong with your >> cluster that you are >> having so much trouble getting it running. Lets try and >> figure it. >> >> St.Ack >> >> >> >> >> >> >> On Tue, Oct 28, 2008 at 7:12 PM, stack >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >> >> >> >> >> I took a quick look Slava (Thanks for sending the >> files). Here's a few >> notes: >> >> + The logs are from after the damage is done; the >> transition from good to >> bad is missing. If I could see that, that would help >> + But what seems to be plain is that that your >> HDFS is very sick. See >> this >> from head of one of the regionserver logs: >> >> 2008-10-27 23:41:12,682 WARN >> org.apache.hadoop.dfs.DFSClient: >> DataStreamer >> Exception: java.io.IOException: Unable to create >> new block. >> at >> >> >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349) >> at >> >> >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735) >> at >> >> >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912) >> >> 2008-10-27 23:41:12,682 WARN >> org.apache.hadoop.dfs.DFSClient: Error >> Recovery for block blk_-5188192041705782716_60000 >> bad datanode[0] >> 2008-10-27 23:41:12,685 ERROR >> >> org.apache.hadoop.hbase.regionserver.CompactSplitThread: >> Compaction/Split >> failed for region >> >> BizDB,1.1.PerfBO1.f2188a42-5eb7-4a6a-82ef-2da0d0ea4ce0,1225136351518 >> java.io.IOException: Could not get block >> locations. Aborting... >> >> >> If HDFS is ailing, hbase is too. In fact, the >> regionservers will shut >> themselves to protect themselves against damaging >> or losing data: >> >> 2008-10-27 23:41:12,688 FATAL >> org.apache.hadoop.hbase.regionserver.Flusher: >> Replay of hlog required. Forcing server restart >> >> So, whats up with your HDFS? Not enough space >> alloted? What happens if >> you run "./bin/hadoop fsck /"? Does that give you >> a clue as to what >> happened? Dig in the datanode and namenode logs. >> Look for where the >> exceptions start. It might give you a clue. >> >> + The suse regionserver log had garbage in it. >> >> St.Ack >> >> >> Slava Gorelik wrote: >> >> >> >> >> Hi. >> My happiness was very short :-( After i >> successfully added 1M rows (50k >> each row) i tried to add 10M rows. >> And after 3-4 working hours it started to >> dying. First one region server >> is died, after another one and eventually all >> cluster is dead. >> >> I attached log files (relevant part, archived) >> from region servers and >> from the master. >> >> Best Regards. >> >> >> >> On Mon, Oct 27, 2008 at 11:19 AM, Slava Gorelik < >> [EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]><mailto: >> [EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>>> wrote: >> >> Hi. >> So far so good, after changing the file >> descriptors >> and dfs.datanode.socket.write.timeout, >> dfs.datanode.max.xcievers >> my cluster works stable. >> Thank You and Best Regards. >> >> P.S. Regarding deleting multiple columns >> missing functionality i >> filled jira : >> https://issues.apache.org/jira/browse/HBASE-961 >> >> >> >> On Sun, Oct 26, 2008 at 12:58 AM, Michael >> Stack <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> >> <mailto:[EMAIL PROTECTED] >> >> <mailto:[EMAIL PROTECTED]>>> wrote: >> >> Slava Gorelik wrote: >> >> Hi.Haven't tried yet them, i'll try >> tomorrow morning. In >> general cluster is >> working well, the problems begins if >> i'm trying to add 10M >> rows, after 1.2M >> if happened. >> >> Anything else running beside the >> regionserver or datanodes >> that would suck resources? When >> datanodes begin to slow, we >> begin to see the issue Jean-Adrien's >> configurations address. >> Are you uploading using MapReduce? Are >> TTs running on same >> nodes as the datanode and regionserver? >> How are you doing the >> upload? Describe what your uploader >> looks like (Sorry if >> you've already done this). >> >> >> I already changed the limit of files >> descriptors, >> >> Good. >> >> >> I'll try >> to change the properties: >> <property> >> <name>dfs.datanode.socket.write.timeout</name> >> <value>0</value> >> </property> >> >> <property> >> <name>dfs.datanode.max.xcievers</name> >> <value>1023</value> >> </property> >> >> >> Yeah, try it. >> >> >> And let you know, is any other >> prescriptions ? Did i miss >> something ? >> >> BTW, off topic, but i sent e-mail >> recently to the list and >> i can't see it: >> Is it possible to delete multiple >> columns in any way by >> regex : for example >> colum_name_* ? >> >> Not that I know of. If its not in the >> API, it should be. >> Mind filing a JIRA? >> >> Thanks Slava. >> St.Ack >> >> >> >> >> >> >> >> >> >> >> >> >> >> >
