While I didn't have a HBase problem, I had a file limit problem when loading Kudu tables via:
$ ./buildall.sh -noclean -notests -format -snapshot_file ... -metastore_snapshot_file ... so a similar change fixed that problem for me. On Fri, Aug 5, 2016 at 1:41 PM, Jim Apple <[email protected]> wrote: > I added the following lines to my /etc/security/limits.conf, then restarted: > > * hard nofile 1048576 > * soft nofile 1048576 > > That seems to have fixed the problem. > > On Fri, Aug 5, 2016 at 1:10 PM, Jim Apple <[email protected]> wrote: >> I restarted, ran bin/testdata/run-all.sh, but running list in hbase >> shell still says: >> >> ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: >> Server is not running yet >> at >> org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2296) >> at >> org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:936) >> at >> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) >> at >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:134) >> at >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:109) >> at java.lang.Thread.run(Thread.java:745) >> >> And I'm getting the a litany of warnings in Hbase master log: >> >> 16/08/05 13:02:43 WARN impl.MetricsConfig: Cannot locate >> configuration: tried >> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties >> ... >> 16/08/05 13:02:54 WARN shortcircuit.ShortCircuitCache: >> ShortCircuitCache(0x12439978): failed to load >> 1073764575_BP-1490185442-127.0.0.1-1456935654337 >> java.lang.NullPointerException >> ... >> 16/08/05 13:02:54 WARN hdfs.BlockReaderFactory: >> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log, >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): >> error creating ShortCircuitReplica. >> java.io.EOFException: unexpected EOF while reading metadata file header >> ... >> 16/08/05 13:02:54 WARN hdfs.BlockReaderFactory: I/O error constructing >> remote block reader. >> java.net.SocketException: Too many open files >> ... >> 16/08/05 13:02:55 WARN hdfs.BlockReaderFactory: >> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log, >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): >> error creating ShortCircuitReplica. >> java.io.IOException: Illegal seek >> >> >> >> >> >> >> On Mon, Jul 25, 2016 at 6:31 AM, Jim Apple <[email protected]> wrote: >>> The NN and DNs have 600-800 files open each, and my ulimit is 1024 per >>> process. On the machine as a whole, ls | wc -l is 1047067. >>> >>> proc_nodemanager and proc_regionserver have a ton of open files: tens >>> of thousands each. For instance, nodemanager has 1200 fds pointing to >>> one of three different zookeeper jars. >>> >>> >>> On Sun, Jul 24, 2016 at 9:49 PM, Martin Grund (Das Grundprinzip.de) >>> <[email protected]> wrote: >>>> One idea is to check your ulimit for file descriptors and run `lsof | grep >>>> wc -l` to see if you for some reason exceeded the limit. Otherwise, a fresh >>>> reboot might help to figure out if you somewhere have a spare process >>>> hogging FDs. >>>> >>>> On Sun, Jul 24, 2016 at 8:09 PM Jim Apple <[email protected]> wrote: >>>> >>>>> The NN and DN logs are empty. >>>>> >>>>> I bin/kill-all.sh at the beginning of this, so I assume that nothing >>>>> is taking them except for my little Impala work. >>>>> >>>>> On Sun, Jul 24, 2016 at 8:03 PM, Bharath Vissapragada >>>>> <[email protected]> wrote: >>>>> > Based on >>>>> > >>>>> > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing >>>>> > remote block reader. >>>>> > java.net.SocketException: Too many open files >>>>> > >>>>> > 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to >>>>> > /127.0.0.1:31000 for block, add to deadNodes and continue. >>>>> > java.net.SocketException: Too many open files >>>>> > >>>>> > I'm guessing your hdfs instance might be overloaded (check the NN/DN >>>>> logs). >>>>> > HMaster is unable to connect to NN while opening regions and hence >>>>> throwing >>>>> > the error. >>>>> > >>>>> > On Mon, Jul 25, 2016 at 8:05 AM, Jim Apple <[email protected]> wrote: >>>>> > >>>>> >> Several thousand lines of things like >>>>> >> >>>>> >> WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4): >>>>> >> failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337 >>>>> >> >>>>> >> java.lang.NullPointerException at >>>>> >> >>>>> >> >>>>> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:126) >>>>> >> ... >>>>> >> >>>>> >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: >>>>> >> >>>>> >> >>>>> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log, >>>>> >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): >>>>> >> error creating ShortCircuitReplica. >>>>> >> >>>>> >> java.io.EOFException: unexpected EOF while reading metadata file header >>>>> >> >>>>> >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing >>>>> >> remote block reader. >>>>> >> java.net.SocketException: Too many open files >>>>> >> >>>>> >> 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to >>>>> >> /127.0.0.1:31000 for block, add to deadNodes and continue. >>>>> >> java.net.SocketException: Too many open files >>>>> >> >>>>> >> 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain >>>>> >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any >>>>> >> node: java.io.IOException: No live nodes contain block >>>>> >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a >>>>> >> fter checking nodes = >>>>> >> [DatanodeInfoWithStorage[127.0.0.1:31000 >>>>> >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]], >>>>> >> ignoredNodes = null No live nodes contain current block Block >>>>> >> locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551 >>>>> >> 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes: >>>>> >> DatanodeInfoWithStorage[127.0.0.1:31000 >>>>> >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]. >>>>> >> Will get new block locations from namenode and retry... >>>>> >> 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 >>>>> >> IOException, will wait for 2772.7114628272548 msec. >>>>> >> 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory: >>>>> >> >>>>> >> >>>>> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log, >>>>> >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): >>>>> >> error creating ShortCircuitReplica. >>>>> >> java.io.IOException: Illegal seek >>>>> >> at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) >>>>> >> >>>>> >> On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragada >>>>> >> <[email protected]> wrote: >>>>> >> > Do you see something in the HMaster log? From the error it looks like >>>>> the >>>>> >> > Hbase master hasn't started properly for some reason. >>>>> >> > >>>>> >> > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple <[email protected]> >>>>> wrote: >>>>> >> > >>>>> >> >> I tried reloading the data with >>>>> >> >> >>>>> >> >> ./bin/load-data.py --workloads functional-query >>>>> >> >> >>>>> >> >> but that gave errors like >>>>> >> >> >>>>> >> >> Executing HBase Command: hbase shell >>>>> >> >> load-functional-query-core-hbase-generated.create >>>>> >> >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib >>>>> is >>>>> >> >> deprecated. Instead, use io.native.lib.available >>>>> >> >> SLF4J: Class path contains multiple SLF4J bindings. >>>>> >> >> SLF4J: Found binding in >>>>> >> >> >>>>> >> >> >>>>> >> >>>>> [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>> >> >> SLF4J: Found binding in >>>>> >> >> >>>>> >> >> >>>>> >> >>>>> [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>> >> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>>> >> >> explanation. >>>>> >> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>> >> >> >>>>> >> >> ERROR: Can't get the locations >>>>> >> >> >>>>> >> >> Here is some help for this command: >>>>> >> >> Start disable of named table: >>>>> >> >> hbase> disable 't1' >>>>> >> >> hbase> disable 'ns1:t1' >>>>> >> >> >>>>> >> >> ERROR: Can't get master address from ZooKeeper; znode data == null >>>>> >> >> >>>>> >> >> On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple <[email protected]> >>>>> >> wrote: >>>>> >> >> > I'm having trouble with my HBase environment, and it's preventing >>>>> me >>>>> >> >> > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have >>>>> tried >>>>> >> >> > this with a clean build, and I have tried unset LD_LIBRARY_PATH && >>>>> >> >> > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh >>>>> >> >> > >>>>> >> >> > Here is the error I get from compute stats: >>>>> >> >> > (./testdata/bin/compute-table-stats.sh) >>>>> >> >> > >>>>> >> >> > Executing: compute stats functional_hbase.alltypessmall >>>>> >> >> > -> Error: ImpalaBeeswaxException: >>>>> >> >> > Query aborted:RuntimeException: couldn't retrieve HBase table >>>>> >> >> > (functional_hbase.alltypessmall) info: >>>>> >> >> > Unable to find region for in functional_hbase.alltypessmall after >>>>> 35 >>>>> >> >> tries. >>>>> >> >> > CAUSED BY: NoServerForRegionException: Unable to find region for >>>>> in >>>>> >> >> > functional_hbase.alltypessmall after 35 tries. >>>>> >> >> > >>>>> >> >> > Here is a snippet of the error in ./testdata/bin/split-hbase.sh >>>>> >> >> > >>>>> >> >> > Sun Jul 24 15:24:52 PDT 2016, >>>>> >> >> > RpcRetryingCaller{globalStartTime=1469399003900, pause=100, >>>>> >> >> > retries=31}, org.apache.hadoop.hbase.MasterNotRunningException: >>>>> >> >> > com.google.protobuf.ServiceException: >>>>> >> >> > >>>>> >> >> >>>>> >> >>>>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException): >>>>> >> >> > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server >>>>> >> >> > is >>>>> >> >> > not running yet >>>>> >> >> > >>>>> >> >> > I tried ./bin/create_testdata.sh, but that exited almost >>>>> immediately >>>>> >> >> > with no error. >>>>> >> >> > >>>>> >> >> > Has anyone else seen and solved this before? >>>>> >> >> >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > -- >>>>> >> > Thanks, >>>>> >> > Bharath >>>>> >> >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Thanks, >>>>> > Bharath >>>>>
