[ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104000#comment-16104000
 ] 

Josh Elser commented on HBASE-18023:
------------------------------------

bq. I've noticed that replication can trigger this quite a bit as the sink 
applies the shipped edits. Should we make a distinction between normal clients 
and replication clients and apply two separate thresholds?

I could see this go either way:

On one side, replication has its own knobs that control how much data is sent 
in a single RPC. This (hopefully) implies that the administrator configured 
replication to use a certain batch size and knows that they did this.
On the other side, I would not be surprised at admins who don't set this value 
and run into memory/GC issues with RegionServers. I could see this message 
proactively warning them "hey, you got some big RPCs coming in" which would 
hopefully steer them in the right direction.

I would say that if the default configuration values lead us to spamming WARN 
messages, that is something we should address in some form.

> Log multi-* requests for more than threshold number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-18023
>                 URL: https://issues.apache.org/jira/browse/HBASE-18023
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Clay B.
>            Assignee: David Harju
>            Priority: Minor
>             Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
>         Attachments: HBASE-18023.addendum.patch, 
> HBASE-18023-branch-1.3.patch, HBASE-18023-branch-1.patch, 
> HBASE-18023.master.001.patch, HBASE-18023.master.002.patch, 
> HBASE-18023.master.003.patch, HBASE-18023.master.004.patch
>
>
> Today, if a user happens to do something like a large multi-put, they can get 
> through request throttling (e.g. it is one request) but still crash a region 
> server with a garbage storm. We have seen regionservers hit this issue and it 
> is silent and deadly. The RS will report nothing more than a mysterious 
> garbage collection and exit out.
> Ideally, we could report a large multi-* request before starting it, in case 
> it happens to be deadly. Knowing the client, user and how many rows are 
> affected would be a good start to tracking down painful users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to