[ https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104000#comment-16104000 ]
Josh Elser commented on HBASE-18023: ------------------------------------ bq. I've noticed that replication can trigger this quite a bit as the sink applies the shipped edits. Should we make a distinction between normal clients and replication clients and apply two separate thresholds? I could see this go either way: On one side, replication has its own knobs that control how much data is sent in a single RPC. This (hopefully) implies that the administrator configured replication to use a certain batch size and knows that they did this. On the other side, I would not be surprised at admins who don't set this value and run into memory/GC issues with RegionServers. I could see this message proactively warning them "hey, you got some big RPCs coming in" which would hopefully steer them in the right direction. I would say that if the default configuration values lead us to spamming WARN messages, that is something we should address in some form. > Log multi-* requests for more than threshold number of rows > ----------------------------------------------------------- > > Key: HBASE-18023 > URL: https://issues.apache.org/jira/browse/HBASE-18023 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Clay B. > Assignee: David Harju > Priority: Minor > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > Attachments: HBASE-18023.addendum.patch, > HBASE-18023-branch-1.3.patch, HBASE-18023-branch-1.patch, > HBASE-18023.master.001.patch, HBASE-18023.master.002.patch, > HBASE-18023.master.003.patch, HBASE-18023.master.004.patch > > > Today, if a user happens to do something like a large multi-put, they can get > through request throttling (e.g. it is one request) but still crash a region > server with a garbage storm. We have seen regionservers hit this issue and it > is silent and deadly. The RS will report nothing more than a mysterious > garbage collection and exit out. > Ideally, we could report a large multi-* request before starting it, in case > it happens to be deadly. Knowing the client, user and how many rows are > affected would be a good start to tracking down painful users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)