Can you clear the logs, clean out the file system and run the test again?

The namenode logs should tell an interesting story.


On 11/24/07 6:12 PM, "Kareem Dana" <[EMAIL PROTECTED]> wrote:

> I ran hadoop fsck and sure enough the DFS was corrupted. It seems that
> the PerformanceEvaluation test is corrupting it. Before I run the
> test, I ran fsck and the DFS was reported as HEALTHY. Once the PE
> fails, the DFS is reported as corrupt. I tried to simplify my setup
> and run the PE again. My new config is as follows:
> 
> hadoop07 - DFS Master, Mapred master, hbase master
> hadoop09-10 - 2 hbase region servers
> hadoop11-12 - 2 datanodes, task trackers
> 
> mapred.map.tasks = 2
> mapred.reduce.tasks = 1
> dfs.replication = 1
> 
> I ran the distributed PE in that configuration and it still failed
> with similar errors. The output of the hadoop fsck for this run was:
> 
> ..........
> 
/tmp/hadoop-kcd/hbase/hregion_.META.,,1/info/mapfiles/6434881831082231493/data>
:
> MISSING 1 blocks of total size 0 B.
> ......................................
> /tmp/hadoop-kcd/hbase/hregion_TestTable,11566878,1227092681544002579/info/mapf
> iles/5263238643231358600/data:
> MISSING 1 blocks of total size 0 B.
> ....
> /tmp/hadoop-kcd/hbase/hregion_TestTable,12612310,1652062411016999689/info/mapf
> iles/2024298319068625138/data:
> MISSING 1 blocks of total size 0 B.
> ....
> /tmp/hadoop-kcd/hbase/hregion_TestTable,12612310,1652062411016999689/info/mapf
> iles/5071453667327337040/data:
> MISSING 1 blocks of total size 0 B.
> .........
> /tmp/hadoop-kcd/hbase/hregion_TestTable,13932,4738192747521322482/info/mapfile
> s/4400784113695734765/data:
> MISSING 1 blocks of total size 0 B.
> ...................................
> ........................................................................
> /tmp/hadoop-kcd/hbase/log_172.16.6.56_-1823376333333123807_60020/hlog.dat.027:
> MISSING 1 blocks of total size 0 B.
> .Status: CORRUPT
>  Total size:    1890454330 B
>  Total blocks:  180 (avg. block size 10502524 B)
>  Total dirs:    190
>  Total files:   173
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Target replication factor:     1
>  Real replication factor:       1.0
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> On Nov 24, 2007 6:21 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>> 
>> I think that stack was suggesting an HDFS fsck, not a disk level fsck.
>> 
>> Try [hadoop fsck /]
>> 
>> 
>> 
>> 
>> 
>> On 11/24/07 4:09 PM, "Kareem Dana" <[EMAIL PROTECTED]> wrote:
>> 
>>> I do not have root access on the xen cluster I'm using. I will ask the
>>> admin to make sure the disk is working properly. Regarding the
>>> mismatch versions though, are you suggesting that different region
>>> servers might be running different versions of hbase/hadoop? They are
>>> all running the same code from the same shared storage. There isn't
>>> even another version of hadoop anywhere for the other nodes to run. I
>>> think I'll try dropping my cluster down to 2 nodes and working back
>>> up... maybe I can pin point a specific problem node. Thanks for taking
>>> a look at my logs.
>>> 
>>> On Nov 24, 2007 5:49 PM, stack <[EMAIL PROTECTED]> wrote:
>>>> I took a quick look Kareem.   As with the last time, hbase keeps having
>>>> trouble w/ the hdfs.  Things start out fine around 16:00 then go bad
>>>> because can't write reliably to the hdfs -- a variety of reasons.  You
>>>> then seem to restart the cluster around 17:37 or so and things seem to
>>>> go along fine for a while until 19:05 when again, all regionservers
>>>> report trouble writing the hdfs.  Have you run an fsck?
>> 
>> 

Reply via email to