[ 
https://issues.apache.org/jira/browse/HBASE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy Yu updated HBASE-7251:
-----------------------------

    Description: 
Background:

A smart guy in the company want to read data from the HBASE in batch, the code 
like the following:(just demonstrate, not runnable):

        List<Get> gets = new ArrayList<Get>();

        for(int i =0; i < n; ++i){
                gets.add("some row key here");

                if (i % 10000 == 0){
                        Results[] results = htable.get(gets);
                        //process results here. 
                        //delete some code
                }
        }



Yes, you know that, this guy forgot "gets.clear()"  after each "htable.get()" 
operation in his code.

One region server becomes very slow, and crashed after 30mins becauseof OOM, 
but we got 15GB log file.
there are flood logs as following:

ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call 
multi(org.apache.hadoop.hbase.client.MultiAction@49540d8d), rpc version=1, 
client version=29, methodsFingerPrint=-56040613 from 10.204.8.148:57933 after 
3980 ms, since caller disconnected
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3468)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3425)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3449)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4198)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1993)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3410)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1409)

Fix:
Server is stop "get" but cannot exit from the "for" loop, so write flood logs 
here.
My patch just log one exception log instead of flood logs. 
Importantly, server stop processing immediately if client timeout or disconnect.

Test:
I used this guy's wrong code read data( NO "gets.clear()" ) from the HBASE, 
when it becomes very slow to get results, I pressed ctrl+C, then there is only 
ONE CallerDisconnectedException exception log and the server stop reading 
immediately, LOG generate the last log entry:

 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020 caught a 
ClosedChannelException, this means that the server was processing a request but 
the client went away. The error message was: null




  was:
Background:

A smart guy in the company want to read data from the HBASE in batch, the code 
like the following:(just demonstrate, not runnable):

List<Get> gets = new ArrayList<Get>();

for(int i =0; i < n; ++i){
  gets.add("some row key here");

  if (i % 10000 == 0){
    Results[] results = htable.get(gets);
    //process results here. 
    //delete some code
  }
}


Yes, you know that, this guy forgot "gets.clear()"  after each "htable.get()" 
operation in his code.

One region server becomes very slow, and crashed after 30mins becauseof OOM, 
but we got 15GB log file.
there are flood logs as following:

ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call 
multi(org.apache.hadoop.hbase.client.MultiAction@49540d8d), rpc version=1, 
client version=29, methodsFingerPrint=-56040613 from 10.204.8.148:57933 after 
3980 ms, since caller disconnected
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3468)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3425)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3449)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4198)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1993)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3410)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1409)

Fix:
Server is stop "get" but cannot exit from the "for" loop, so write flood logs 
here.
My patch just log one exception log instead of flood logs. 
Importantly, server stop processing immediately if client timeout or disconnect.

Test:
I used this guy's wrong code read data( NO "gets.clear()" ) from the HBASE, 
when it becomes very slow to get results, I pressed ctrl+C, then there is only 
ONE CallerDisconnectedException exception log and the server stop reading 
immediately, LOG generate the last log entry:

 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020 caught a 
ClosedChannelException, this means that the server was processing a request but 
the client went away. The error message was: null




    
> Avoid flood logs during client disconnect during batch get operation
> --------------------------------------------------------------------
>
>                 Key: HBASE-7251
>                 URL: https://issues.apache.org/jira/browse/HBASE-7251
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2
>            Reporter: Azuryy Yu
>              Labels: patch
>             Fix For: 0.94.3
>
>         Attachments: HBASE-7251.patch
>
>
> Background:
> A smart guy in the company want to read data from the HBASE in batch, the 
> code like the following:(just demonstrate, not runnable):
>       List<Get> gets = new ArrayList<Get>();
>       for(int i =0; i < n; ++i){
>               gets.add("some row key here");
>               if (i % 10000 == 0){
>                       Results[] results = htable.get(gets);
>                       //process results here. 
>                       //delete some code
>               }
>       }
> Yes, you know that, this guy forgot "gets.clear()"  after each "htable.get()" 
> operation in his code.
> One region server becomes very slow, and crashed after 30mins becauseof OOM, 
> but we got 15GB log file.
> there are flood logs as following:
> ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call 
> multi(org.apache.hadoop.hbase.client.MultiAction@49540d8d), rpc version=1, 
> client version=29, methodsFingerPrint=-56040613 from 10.204.8.148:57933 after 
> 3980 ms, since caller disconnected
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3468)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3425)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3449)
>         at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4198)
>         at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1993)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3410)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1409)
> Fix:
> Server is stop "get" but cannot exit from the "for" loop, so write flood logs 
> here.
> My patch just log one exception log instead of flood logs. 
> Importantly, server stop processing immediately if client timeout or 
> disconnect.
> Test:
> I used this guy's wrong code read data( NO "gets.clear()" ) from the HBASE, 
> when it becomes very slow to get results, I pressed ctrl+C, then there is 
> only ONE CallerDisconnectedException exception log and the server stop 
> reading immediately, LOG generate the last log entry:
>  WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020 caught 
> a ClosedChannelException, this means that the server was processing a request 
> but the client went away. The error message was: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to