Based on 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.net.SocketException: Too many open files
16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to /127.0.0.1:31000 for block, add to deadNodes and continue. java.net.SocketException: Too many open files I'm guessing your hdfs instance might be overloaded (check the NN/DN logs). HMaster is unable to connect to NN while opening regions and hence throwing the error. On Mon, Jul 25, 2016 at 8:05 AM, Jim Apple <[email protected]> wrote: > Several thousand lines of things like > > WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4): > failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337 > > java.lang.NullPointerException at > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:126) > ... > > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: > > BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log, > block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): > error creating ShortCircuitReplica. > > java.io.EOFException: unexpected EOF while reading metadata file header > > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing > remote block reader. > java.net.SocketException: Too many open files > > 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to > /127.0.0.1:31000 for block, add to deadNodes and continue. > java.net.SocketException: Too many open files > > 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain > BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any > node: java.io.IOException: No live nodes contain block > BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a > fter checking nodes = > [DatanodeInfoWithStorage[127.0.0.1:31000 > ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]], > ignoredNodes = null No live nodes contain current block Block > locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551 > 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes: > DatanodeInfoWithStorage[127.0.0.1:31000 > ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]. > Will get new block locations from namenode and retry... > 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 > IOException, will wait for 2772.7114628272548 msec. > 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory: > > BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log, > block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): > error creating ShortCircuitReplica. > java.io.IOException: Illegal seek > at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) > > On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragada > <[email protected]> wrote: > > Do you see something in the HMaster log? From the error it looks like the > > Hbase master hasn't started properly for some reason. > > > > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple <[email protected]> wrote: > > > >> I tried reloading the data with > >> > >> ./bin/load-data.py --workloads functional-query > >> > >> but that gave errors like > >> > >> Executing HBase Command: hbase shell > >> load-functional-query-core-hbase-generated.create > >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib is > >> deprecated. Instead, use io.native.lib.available > >> SLF4J: Class path contains multiple SLF4J bindings. > >> SLF4J: Found binding in > >> > >> > [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> SLF4J: Found binding in > >> > >> > [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > >> explanation. > >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > >> > >> ERROR: Can't get the locations > >> > >> Here is some help for this command: > >> Start disable of named table: > >> hbase> disable 't1' > >> hbase> disable 'ns1:t1' > >> > >> ERROR: Can't get master address from ZooKeeper; znode data == null > >> > >> On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple <[email protected]> > wrote: > >> > I'm having trouble with my HBase environment, and it's preventing me > >> > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried > >> > this with a clean build, and I have tried unset LD_LIBRARY_PATH && > >> > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh > >> > > >> > Here is the error I get from compute stats: > >> > (./testdata/bin/compute-table-stats.sh) > >> > > >> > Executing: compute stats functional_hbase.alltypessmall > >> > -> Error: ImpalaBeeswaxException: > >> > Query aborted:RuntimeException: couldn't retrieve HBase table > >> > (functional_hbase.alltypessmall) info: > >> > Unable to find region for in functional_hbase.alltypessmall after 35 > >> tries. > >> > CAUSED BY: NoServerForRegionException: Unable to find region for in > >> > functional_hbase.alltypessmall after 35 tries. > >> > > >> > Here is a snippet of the error in ./testdata/bin/split-hbase.sh > >> > > >> > Sun Jul 24 15:24:52 PDT 2016, > >> > RpcRetryingCaller{globalStartTime=1469399003900, pause=100, > >> > retries=31}, org.apache.hadoop.hbase.MasterNotRunningException: > >> > com.google.protobuf.ServiceException: > >> > > >> > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException): > >> > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is > >> > not running yet > >> > > >> > I tried ./bin/create_testdata.sh, but that exited almost immediately > >> > with no error. > >> > > >> > Has anyone else seen and solved this before? > >> > > > > > > > > -- > > Thanks, > > Bharath > -- Thanks, Bharath
