I took a look.

First, enable DEBUG.  See the hbase FAQ for how.

Looking, I see that all was running fine till:

2008-11-03 14:10:08,261 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.X.X.Y:60020. Already tried 0 time(s).

...in the middle of an attempt at scanning the .META. region.

Looking through regionserver logs, they are all fine till about that above time when I start to see variations on:

2008-11-03 14:08:46,440 INFO org.apache.hadoop.dfs.DFSClient: Could not obtain block blk_1223341017118968735_305051 from any node: java.io.IOException: No live nodes contain current block

....and

2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.X.X.Y:50010 2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_6726606309673852040_314096

Your hdfs went bad for some reason around above time. I don't see any obvious explanation for why it went bad. You were running balancer at the time IIRC?

Could you netstat your running datanodes and see how many concurrent connections you had running? Was 1024 enough? You had configured a max of 1024? I don't see the ulimit print out in these logs so presume its > 1024.

How many regions do you have in your table when it starts to go wonky? You have 6 datanodes running beside your 6 regionservers?

St.Ack


Slava Gorelik wrote:
Hi Michael.
I'm sending logs, in 2 parts (2 messages)
Part 1


On Tue, Nov 4, 2008 at 11:44 PM, Slava Gorelik <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    Thank You. Now it's clear.


    On Tue, Nov 4, 2008 at 11:31 PM, stack <[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>> wrote:

        Slava Gorelik wrote:

            One more regarding the blockCache, how changes in store
            files (as i
            understand those are MapFiles) are reflected on client
            side cache. If we are
            talking about more than one client that doing a changes ?
            If each client has
            different part of the MapFile ? or something else ?

        The block cache cache is over in the server. Its a cache for
        store files which never change once written.  Did I say
        client-side cache?  I should have been more clear.  The client
        in this case is the regionserver itself.   The cache is so the
        regionserver saves on its trips over the network visiting
        datanodes.
        St.Ack



            Best Regards.

            On Tue, Nov 4, 2008 at 11:10 PM, Slava Gorelik
            <[EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>>wrote:

                I can try to reproduce it again, but before this i
                would like to send you a
                logs.
                Best Regards.


                On Tue, Nov 4, 2008 at 10:05 PM, stack
                <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

                    Then we should try and figure if there is an issue
                    in the balancer, or
                    maybe there is something missing if we are not
                    doing a big upload in a
                    manner that balances the upload across HDFS?
                    St.Ack

                    Slava Gorelik wrote:

                        Sure, i'll arrange logs tomorrow.About
                        balancer, to wait when the massive
                        work is finished is good in testing
                        environment but in production it's
                        not
                        relevant :-)

                        Best Regards.

                        On Tue, Nov 4, 2008 at 9:48 PM, stack
                        <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
                        wrote:



                            Slava Gorelik wrote:



                                Hi.Regarding the failure of new block
                                creation - i failed to run hbase
                                till
                                i reformatted HDFS again.




                            I'd be interested in the logs.

                             I just wandering if hadoop re balancing
                            is necessary? Will it balance


                                itself
                                ? As i understand hadoop balancer is
                                moving data between data nodes,
                                but
                                in
                                my case this is during massive (8
                                clients just adding a records - about
                                400
                                requests for all region servers - 6).
                                So, is it good idea to run
                                balancer during heavy load ?




                            I don't have sufficient experience running
                            the balancer.  Perhaps wait
                            till
                            upload is done, then run it?

                            St.Ack









Reply via email to