Created HBASE-3889 for this.
On Mon, May 16, 2011 at 8:42 PM, Stack <[email protected]> wrote: > On Mon, May 16, 2011 at 2:07 AM, Lars George <[email protected]> wrote: >> I am still stuck with this cluster not starting again, I know it is >> all local and such, therefore not really representative, but this >> ought to work, no? See this log I get at startup: >> > > Do you have replication on? Is this TRUNK of 0.90 branch? If TRUNK > then we are doing distributed splitting? > > Sounds like bug in here Lars, especially if it makes for this much confusion. > > St.Ack > >> 2011-05-16 11:00:36,834 INFO >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker >> 10.0.0.64,60020,1305536432387 starting >> 2011-05-16 11:00:36,838 INFO >> org.apache.hadoop.hbase.regionserver.StoreFile: Allocating >> LruBlockCache with maximum size 197.5m >> 2011-05-16 11:00:36,850 INFO >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: successfully >> transitioned task /hbase/splitlog/RESCAN0000234067 to final state done >> 2011-05-16 11:00:36,852 DEBUG >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or >> departed >> 2011-05-16 11:00:36,854 INFO >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker >> 10.0.0.64,60020,1305536432387 acquired task >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:00:36,871 DEBUG >> org.apache.hadoop.hbase.monitoring.MonitoredTask: setDescritption: >> Splitting log file >> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389into >> a temporary staging area. >> 2011-05-16 11:00:36,874 INFO >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: >> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389, >> length=16173236224 >> 2011-05-16 11:00:36,874 DEBUG >> org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Opening >> log file >> 2011-05-16 11:00:36,875 INFO org.apache.hadoop.hbase.util.FSUtils: >> Recovering file >> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389 >> 2011-05-16 11:00:37,415 WARN >> org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error >> detected. Found 1 replicas but expecting 3 replicas. Requesting close >> of hlog. >> 2011-05-16 11:00:37,876 INFO org.apache.hadoop.hbase.util.FSUtils: >> Finished lease recover attempt for >> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389 >> 2011-05-16 11:00:38,073 INFO >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's >> directory doesn't exist: >> hdfs://localhost:8020/hbase/usertable/30c4d0a47703214845d0676d0c7b36f0. >> It is very likely that it was already split so it's safe to discard >> those edits. >> 2011-05-16 11:00:38,074 INFO >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: processed 0 >> edits across 0 regions threw away edits for 1 regions log file = >> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389 >> is corrupted = false >> 2011-05-16 11:00:38,074 DEBUG >> org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: processed >> 0 edits across 0 regions threw away edits for 1 regions log file = >> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389 >> is corrupted = false >> 2011-05-16 11:00:38,074 DEBUG >> org.apache.hadoop.hbase.monitoring.MonitoredTask: markComplete: >> processed 0 edits across 0 regions threw away edits for 1 regions log >> file = >> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389 >> is corrupted = false >> 2011-05-16 11:00:38,074 INFO >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker >> 10.0.0.64,60020,1305536432387 done with task >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> in 1217ms >> 2011-05-16 11:00:38,825 INFO >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: >> Moving 10.0.0.64,60020,1305535848569's hlogs to my queue >> >> ==> /var/lib/hbase/logs/hbase-larsgeorge-5-master-de1-app-mbp-2.log <== >> 2011-05-16 11:00:41,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:42,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:43,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:44,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:45,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:46,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:47,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:48,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:49,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:50,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:51,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:52,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:53,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:54,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:55,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:56,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:57,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:58,691 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:00:59,692 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:01:00,692 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:01:01,692 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:01:02,692 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:01:03,692 INFO >> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting task >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:03,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 0 >> 2011-05-16 11:01:03,693 INFO >> org.apache.hadoop.hbase.master.SplitLogManager: resubmitted 1 out of 1 >> tasks >> 2011-05-16 11:01:03,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> ver = 28 >> 2011-05-16 11:01:03,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired >> /hbase/splitlog/RESCAN0000234069 ver = 0 >> 2011-05-16 11:01:04,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:04,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:05,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:05,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:06,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:06,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:07,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:07,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:08,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:08,693 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:09,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:09,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:10,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:10,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:11,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:11,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:12,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:12,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:13,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:13,694 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:14,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:14,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:15,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:15,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:16,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:16,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:17,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:17,695 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:18,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:18,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:19,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:19,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:20,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:20,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:21,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:21,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:22,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:22,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:23,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:23,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:24,697 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:24,697 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:25,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:25,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:26,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:26,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:27,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:27,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:28,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:28,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:29,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> 2011-05-16 11:01:29,697 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >> unassigned = 1 >> 2011-05-16 11:01:30,696 DEBUG >> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task >> path -> >> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 >> >> I hacked the code to have the SplitLogManager delete all orphaned >> RESCAN znodes, as I ended up having hundreds of them, and there seems >> to be no way to "delete *" them, right? Is there a trick to be able to >> delete a non-empty node in zkCli? >> >> Anyhow, the split is supposedly done, or the task at least reports as >> complete, then the replication ReplicationSourceManager kicks in, and >> then the task gets relisted over and over again. Just after a few >> minutes you see this in ZK's /hbase/splitlogs: >> >> [RESCAN0000234200, RESCAN0000234209, RESCAN0000234207, >> RESCAN0000234208, RESCAN0000234205, RESCAN0000234206, >> RESCAN0000234203, RESCAN0000234204, RESCAN0000234201, >> RESCAN0000234202, RESCAN0000234237, RESCAN0000234236, >> RESCAN0000234235, RESCAN0000234234, RESCAN0000234239, >> RESCAN0000234238, RESCAN0000234232, RESCAN0000234233, >> RESCAN0000234230, RESCAN0000234231, RESCAN0000234219, >> RESCAN0000234218, RESCAN0000234217, RESCAN0000234216, >> RESCAN0000234215, RESCAN0000234214, RESCAN0000234213, >> RESCAN0000234212, RESCAN0000234210, RESCAN0000234211, >> RESCAN0000234228, RESCAN0000234227, RESCAN0000234229, >> RESCAN0000234224, RESCAN0000234223, RESCAN0000234226, >> RESCAN0000234225, RESCAN0000234220, RESCAN0000234221, >> RESCAN0000234222, RESCAN0000234100, RESCAN0000234101, >> RESCAN0000234107, RESCAN0000234106, RESCAN0000234109, >> RESCAN0000234108, RESCAN0000234103, RESCAN0000234102, >> RESCAN0000234105, RESCAN0000234104, RESCAN0000234111, >> RESCAN0000234112, RESCAN0000234110, RESCAN0000234116, >> RESCAN0000234115, RESCAN0000234114, RESCAN0000234113, >> RESCAN0000234119, RESCAN0000234118, RESCAN0000234117, >> RESCAN0000234120, RESCAN0000234121, RESCAN0000234122, >> RESCAN0000234123, RESCAN0000234125, RESCAN0000234124, >> RESCAN0000234127, RESCAN0000234126, RESCAN0000234129, >> RESCAN0000234128, RESCAN0000234134, RESCAN0000234133, >> RESCAN0000234132, RESCAN0000234131, RESCAN0000234130, >> RESCAN0000234139, RESCAN0000234137, RESCAN0000234138, >> RESCAN0000234135, RESCAN0000234136, RESCAN0000234143, >> RESCAN0000234142, RESCAN0000234145, RESCAN0000234144, >> RESCAN0000234141, RESCAN0000234140, RESCAN0000234146, >> RESCAN0000234147, RESCAN0000234148, RESCAN0000234149, >> RESCAN0000234152, RESCAN0000234151, RESCAN0000234150, >> RESCAN0000234156, RESCAN0000234155, RESCAN0000234154, >> RESCAN0000234153, RESCAN0000234159, RESCAN0000234157, >> RESCAN0000234158, RESCAN0000234161, RESCAN0000234160, >> RESCAN0000234163, RESCAN0000234162, RESCAN0000234165, >> RESCAN0000234164, RESCAN0000234167, RESCAN0000234166, >> RESCAN0000234168, RESCAN0000234169, RESCAN0000234179, >> RESCAN0000234175, RESCAN0000234176, RESCAN0000234177, >> RESCAN0000234178, RESCAN0000234171, RESCAN0000234172, >> RESCAN0000234173, RESCAN0000234174, RESCAN0000234170, >> RESCAN0000234188, RESCAN0000234189, RESCAN0000234186, >> RESCAN0000234187, RESCAN0000234184, RESCAN0000234185, >> RESCAN0000234182, RESCAN0000234183, RESCAN0000234180, >> RESCAN0000234181, RESCAN0000234193, RESCAN0000234194, >> RESCAN0000234195, RESCAN0000234196, RESCAN0000234197, >> RESCAN0000234198, RESCAN0000234199, RESCAN0000234190, >> RESCAN0000234191, RESCAN0000234192, RESCAN0000234070, >> RESCAN0000234071, RESCAN0000234072, RESCAN0000234073, >> RESCAN0000234074, RESCAN0000234075, RESCAN0000234076, >> RESCAN0000234077, RESCAN0000234078, RESCAN0000234079, >> RESCAN0000234081, RESCAN0000234082, RESCAN0000234080, >> RESCAN0000234085, RESCAN0000234086, RESCAN0000234083, >> RESCAN0000234084, RESCAN0000234089, RESCAN0000234087, >> RESCAN0000234088, RESCAN0000234069, RESCAN0000234099, >> RESCAN0000234098, RESCAN0000234095, RESCAN0000234094, >> RESCAN0000234097, RESCAN0000234096, RESCAN0000234091, >> RESCAN0000234090, RESCAN0000234093, RESCAN0000234092, >> hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389] >> >> After that all is stuck. Ideas? >> >> On Mon, May 16, 2011 at 7:03 AM, Lars George <[email protected]> wrote: >>> Hi, >>> >>> I am on trunk and testing in pseudo distributed setup. I loaded the >>> machine with YCSB and got it to break at a few million inserts during >>> the load phase with the GC taking too long and the compaction queue >>> going through the roof subsequently. Since then I cannot recover the >>> local "cluster". It is stuck printing this: >>> >>> ... >>> 2011-05-16 06:59:05,389 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired >>> /hbase/splitlog/RESCAN0000148501 ver = 0 >>> 2011-05-16 06:59:06,388 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >>> unassigned = 1 >>> 2011-05-16 06:59:06,389 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting >>> unassigned task(s) after timeout >>> 2011-05-16 06:59:06,390 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired >>> /hbase/splitlog/RESCAN0000148502 ver = 0 >>> 2011-05-16 06:59:07,388 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >>> unassigned = 1 >>> 2011-05-16 06:59:07,388 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting >>> unassigned task(s) after timeout >>> 2011-05-16 06:59:07,389 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired >>> /hbase/splitlog/RESCAN0000148503 ver = 0 >>> 2011-05-16 06:59:08,388 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >>> unassigned = 1 >>> 2011-05-16 06:59:08,388 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting >>> unassigned task(s) after timeout >>> 2011-05-16 06:59:08,389 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired >>> /hbase/splitlog/RESCAN0000148504 ver = 0 >>> 2011-05-16 06:59:09,388 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 >>> unassigned = 1 >>> 2011-05-16 06:59:09,389 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting >>> unassigned task(s) after timeout >>> 2011-05-16 06:59:09,390 DEBUG >>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired >>> /hbase/splitlog/RESCAN0000148505 ver = 0 >>> ... >>> >>> This keeps on going up and up. What is the right way to recover from >>> this? Delete something from ZK? Delete something from HDFS? What shell >>> commands would help? >>> >>> Thanks, >>> Lars >>> >> >
