Re: HBase errors prevent run-all-tests.sh

Michael Brown Tue, 09 Aug 2016 09:27:41 -0700

While I didn't have a HBase problem, I had a file limit problem when
loading Kudu tables via:


$ ./buildall.sh -noclean -notests -format -snapshot_file ...
-metastore_snapshot_file ...

so a similar change fixed that problem for me.

On Fri, Aug 5, 2016 at 1:41 PM, Jim Apple <[email protected]> wrote:
> I added the following lines to my /etc/security/limits.conf, then restarted:
>
> *               hard    nofile          1048576
> *               soft    nofile          1048576
>
> That seems to have fixed the problem.
>
> On Fri, Aug 5, 2016 at 1:10 PM, Jim Apple <[email protected]> wrote:
>> I restarted, ran bin/testdata/run-all.sh, but running list in hbase
>> shell still says:
>>
>> ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException:
>> Server is not running yet
>>         at 
>> org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2296)
>>         at 
>> org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:936)
>>         at 
>> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654)
>>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
>>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
>>         at 
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:134)
>>         at 
>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:109)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> And I'm getting the a litany of warnings in Hbase master log:
>>
>> 16/08/05 13:02:43 WARN impl.MetricsConfig: Cannot locate
>> configuration: tried
>> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
>> ...
>> 16/08/05 13:02:54 WARN shortcircuit.ShortCircuitCache:
>> ShortCircuitCache(0x12439978): failed to load
>> 1073764575_BP-1490185442-127.0.0.1-1456935654337
>> java.lang.NullPointerException
>> ...
>> 16/08/05 13:02:54 WARN hdfs.BlockReaderFactory:
>> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log,
>> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805):
>> error creating ShortCircuitReplica.
>> java.io.EOFException: unexpected EOF while reading metadata file header
>> ...
>> 16/08/05 13:02:54 WARN hdfs.BlockReaderFactory: I/O error constructing
>> remote block reader.
>> java.net.SocketException: Too many open files
>> ...
>> 16/08/05 13:02:55 WARN hdfs.BlockReaderFactory:
>> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log,
>> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805):
>> error creating ShortCircuitReplica.
>> java.io.IOException: Illegal seek
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 25, 2016 at 6:31 AM, Jim Apple <[email protected]> wrote:
>>> The NN and DNs have 600-800 files open each, and my ulimit is 1024 per
>>> process. On the machine as a whole, ls  | wc -l is 1047067.
>>>
>>> proc_nodemanager and proc_regionserver have a ton of open files: tens
>>> of thousands each. For instance, nodemanager has 1200 fds pointing to
>>> one of three different zookeeper jars.
>>>
>>>
>>> On Sun, Jul 24, 2016 at 9:49 PM, Martin Grund (Das Grundprinzip.de)
>>> <[email protected]> wrote:
>>>> One idea is to check your ulimit for file descriptors and run `lsof | grep
>>>> wc -l` to see if you for some reason exceeded the limit. Otherwise, a fresh
>>>> reboot might help to figure out if you somewhere have a spare process
>>>> hogging FDs.
>>>>
>>>> On Sun, Jul 24, 2016 at 8:09 PM Jim Apple <[email protected]> wrote:
>>>>
>>>>> The NN and DN logs are empty.
>>>>>
>>>>> I bin/kill-all.sh at the beginning of this, so I assume that nothing
>>>>> is taking them except for my little Impala work.
>>>>>
>>>>> On Sun, Jul 24, 2016 at 8:03 PM, Bharath Vissapragada
>>>>> <[email protected]> wrote:
>>>>> > Based on
>>>>> >
>>>>> > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing
>>>>> > remote block reader.
>>>>> > java.net.SocketException: Too many open files
>>>>> >
>>>>> > 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to
>>>>> > /127.0.0.1:31000 for block, add to deadNodes and continue.
>>>>> > java.net.SocketException: Too many open files
>>>>> >
>>>>> > I'm guessing your hdfs instance might be overloaded (check the NN/DN
>>>>> logs).
>>>>> > HMaster is unable to connect to NN while opening regions and hence
>>>>> throwing
>>>>> > the error.
>>>>> >
>>>>> > On Mon, Jul 25, 2016 at 8:05 AM, Jim Apple <[email protected]> wrote:
>>>>> >
>>>>> >> Several thousand lines of things like
>>>>> >>
>>>>> >> WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4):
>>>>> >> failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337
>>>>> >>
>>>>> >> java.lang.NullPointerException at
>>>>> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:126)
>>>>> >> ...
>>>>> >>
>>>>> >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory:
>>>>> >>
>>>>> >>
>>>>> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log,
>>>>> >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805):
>>>>> >> error creating ShortCircuitReplica.
>>>>> >>
>>>>> >> java.io.EOFException: unexpected EOF while reading metadata file header
>>>>> >>
>>>>> >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing
>>>>> >> remote block reader.
>>>>> >> java.net.SocketException: Too many open files
>>>>> >>
>>>>> >> 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to
>>>>> >> /127.0.0.1:31000 for block, add to deadNodes and continue.
>>>>> >> java.net.SocketException: Too many open files
>>>>> >>
>>>>> >> 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain
>>>>> >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any
>>>>> >> node: java.io.IOException: No live nodes contain block
>>>>> >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a
>>>>> >> fter checking nodes =
>>>>> >> [DatanodeInfoWithStorage[127.0.0.1:31000
>>>>> >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]],
>>>>> >> ignoredNodes = null No live nodes contain current block Block
>>>>> >> locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551
>>>>> >> 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes:
>>>>> >> DatanodeInfoWithStorage[127.0.0.1:31000
>>>>> >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK].
>>>>> >> Will get new block locations from namenode and retry...
>>>>> >> 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1
>>>>> >> IOException, will wait for 2772.7114628272548 msec.
>>>>> >> 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory:
>>>>> >>
>>>>> >>
>>>>> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log,
>>>>> >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805):
>>>>> >> error creating ShortCircuitReplica.
>>>>> >> java.io.IOException: Illegal seek
>>>>> >>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>>>>> >>
>>>>> >> On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragada
>>>>> >> <[email protected]> wrote:
>>>>> >> > Do you see something in the HMaster log? From the error it looks like
>>>>> the
>>>>> >> > Hbase master hasn't started properly for some reason.
>>>>> >> >
>>>>> >> > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple <[email protected]>
>>>>> wrote:
>>>>> >> >
>>>>> >> >> I tried reloading the data with
>>>>> >> >>
>>>>> >> >> ./bin/load-data.py --workloads functional-query
>>>>> >> >>
>>>>> >> >> but that gave errors like
>>>>> >> >>
>>>>> >> >> Executing HBase Command: hbase shell
>>>>> >> >> load-functional-query-core-hbase-generated.create
>>>>> >> >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib
>>>>> is
>>>>> >> >> deprecated. Instead, use io.native.lib.available
>>>>> >> >> SLF4J: Class path contains multiple SLF4J bindings.
>>>>> >> >> SLF4J: Found binding in
>>>>> >> >>
>>>>> >> >>
>>>>> >>
>>>>> [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> >> >> SLF4J: Found binding in
>>>>> >> >>
>>>>> >> >>
>>>>> >>
>>>>> [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> >> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>> >> >> explanation.
>>>>> >> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>> >> >>
>>>>> >> >> ERROR: Can't get the locations
>>>>> >> >>
>>>>> >> >> Here is some help for this command:
>>>>> >> >> Start disable of named table:
>>>>> >> >>   hbase> disable 't1'
>>>>> >> >>   hbase> disable 'ns1:t1'
>>>>> >> >>
>>>>> >> >> ERROR: Can't get master address from ZooKeeper; znode data == null
>>>>> >> >>
>>>>> >> >> On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple <[email protected]>
>>>>> >> wrote:
>>>>> >> >> > I'm having trouble with my HBase environment, and it's preventing
>>>>> me
>>>>> >> >> > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have
>>>>> tried
>>>>> >> >> > this with a clean build, and I have tried unset LD_LIBRARY_PATH &&
>>>>> >> >> > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh
>>>>> >> >> >
>>>>> >> >> > Here is the error I get from compute stats:
>>>>> >> >> > (./testdata/bin/compute-table-stats.sh)
>>>>> >> >> >
>>>>> >> >> > Executing: compute stats functional_hbase.alltypessmall
>>>>> >> >> >   -> Error: ImpalaBeeswaxException:
>>>>> >> >> >  Query aborted:RuntimeException: couldn't retrieve HBase table
>>>>> >> >> > (functional_hbase.alltypessmall) info:
>>>>> >> >> > Unable to find region for  in functional_hbase.alltypessmall after
>>>>> 35
>>>>> >> >> tries.
>>>>> >> >> > CAUSED BY: NoServerForRegionException: Unable to find region for
>>>>> in
>>>>> >> >> > functional_hbase.alltypessmall after 35 tries.
>>>>> >> >> >
>>>>> >> >> > Here is a snippet of the error in ./testdata/bin/split-hbase.sh
>>>>> >> >> >
>>>>> >> >> > Sun Jul 24 15:24:52 PDT 2016,
>>>>> >> >> > RpcRetryingCaller{globalStartTime=1469399003900, pause=100,
>>>>> >> >> > retries=31}, org.apache.hadoop.hbase.MasterNotRunningException:
>>>>> >> >> > com.google.protobuf.ServiceException:
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException):
>>>>> >> >> > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server 
>>>>> >> >> > is
>>>>> >> >> > not running yet
>>>>> >> >> >
>>>>> >> >> > I tried ./bin/create_testdata.sh, but that exited almost
>>>>> immediately
>>>>> >> >> > with no error.
>>>>> >> >> >
>>>>> >> >> > Has anyone else seen and solved this before?
>>>>> >> >>
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > Thanks,
>>>>> >> > Bharath
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Thanks,
>>>>> > Bharath
>>>>>

Re: HBase errors prevent run-all-tests.sh

Reply via email to