[ 
https://issues.apache.org/jira/browse/HBASE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053136#comment-14053136
 ] 

Amit Sela commented on HBASE-4030:
----------------------------------

Happened to me in 0.94.12 - I can't seem to reproduce.
It happened "out of nowhere" - no change made in application/infrastructure in 
a while.
Audit log shows that the same regionserver is opening one of the regions, 
renaming (moving from MR output dir into hbase region directory) and trying to 
open again from MR output dir (repeating 10 times).
Open-Rename-10xOpen  appears in that order in the audit log, with a msec 
difference all in the same region server.

Logs:
On the client side this is the Excpetion:
java.net.SocketTimeoutException: Call to node.xxx.com/xxx.xxx.xxx.xxx:PORT 
failed on socket timeout exception: java.net.SocketTimeoutException: 60000 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/xxx.xxx.xxx.xxx:PORT 
remote=node.xxx.com/xxx.xxx.xxx.xxx:PORT]
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@29f2a6e3, 
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.io.MultipleIOException: 6 exceptions 
[java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
        at 
org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

On the regionserver:

ERROR org.apache.hadoop.hbase.regionserver.HRegion: There were one or more IO 
errors when checking if the bulk load is ok.
org.apache.hadoop.io.MultipleIOException: 6 exceptions 
[java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
 java.io.FileNotFoundException: File does not exist: 
/data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
        at 
org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

followed by:

ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: 
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call 
next(4522610431482097770, 250), rpc version=1, client version=29, 
methodsFingerPrint=-1368823753 from xxx.xxx.xxx.xxx after 12507 ms, since 
caller disconnected
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
2014-07-06 03:52:14,278 [IPC Server handler 28 on 8041] ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call 
next(7354511084312054096, 250), rpc version=1, client version=29, 
methodsFingerPrint=-1368823753 from xxx.xxx.xxx.xxx after 9476 ms, since caller 
disconnected
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) 

> LoadIncrementalHFiles fails with FileNotFoundException
> ------------------------------------------------------
>
>                 Key: HBASE-4030
>                 URL: https://issues.apache.org/jira/browse/HBASE-4030
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>         Environment: CDH3bu on Ubuntu 4.4.3
>            Reporter: Adam Phelps
>
> -- We've been seeing intermittent failures of calls to LoadIncrementalHFiles. 
>  When this happens the node that made the call will see a 
> FileNotFoundException such as this:
> 2011-06-23 15:47:34.379566500 java.net.SocketTimeoutException: Call to 
> s8.XXX/67.215.90.38:60020 failed on socket timeout exception: 
> java.net.SocketTi
> meoutException: 60000 millis timeout while waiting for channel to be ready 
> for read. ch : java.nio.channels.SocketChannel[connected 
> local=/67.215.90.51:51605 remo
> te=s8.XXX/67.215.90.38:60020]
> 2011-06-23 15:47:34.379570500 java.io.FileNotFoundException: 
> java.io.FileNotFoundException: File does not exist: 
> /hfiles/2011/06/23/14/domainsranked/TopDomainsRan
> k.r3v5PRvK/handling/3557032074765091256
> 2011-06-23 15:47:34.379573500   at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1602)
> 2011-06-23 15:47:34.379573500   at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1593)
> -- Over on the regionserver that was loading this we see that it attempted to 
> load and hit a 60 second timeout:
> 2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Validating hfile at 
> hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
>  for inclusion in store handling region 
> domainsranked,368449:2011/0/03/23:category:ffffffff:com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b
> 8.
> ...
> 2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.DFSClient: Failed to 
> connect to /67.215.90.38:50010, add to deadNodes and continue
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/67.215.90.38:42199 remote=/67.215.90.38:50010]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> at java.io.DataInputStream.readShort(DataInputStream.java:295)
> -- We suspect this particular problem is a resource contention issue on our 
> side.  However, the loading process proceeds to rename the file despite the 
> failure:
> 2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming bulk load file 
> hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
>  to 
> hdfs://namenode.XXX:8020/hbase/domainsranked/d4925aca7852bed32613a509215d42b8/handling/3615917062821145533
> -- And then the LoadIncrementalHFiles tries to load the hfile again:
> 2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Validating hfile at 
> hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
>  for inclusion in store handling region 
> domainsranked,368449:2011/05/03/23:category:ffffffff:com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b8.
> 2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.HBaseServer: IPC Server 
> handler 147 on 60020, call 
> bulkLoadHFile(hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256,
>  [B@4224508b, [B@5e23f799) from 67.215.90.51:51856: error: 
> java.io.FileNotFoundException: File does not exist: 
> /hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
> -- This eventually leads to the load command failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to