[jira] [Comment Edited] (HBASE-23938) Replicate slow/large RPC calls to HDFS

Viraj Jasani (Jira) Wed, 13 May 2020 10:50:13 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106518#comment-17106518
 ]


Viraj Jasani edited comment on HBASE-23938 at 5/13/20, 5:49 PM:
----------------------------------------------------------------

Hello Stack/Andrew,

As Andrew said, yes the main purpose behind having to store 
responseTooSlow/responseTooLarge RPC logs optionally in HDFS (System table) on 
top of ringbuffer is to persist them forever. While I understand not all users 
might want to store them permanently, having this option might be useful to 
provide historical system performance with actual RPC data. Moreover, these 
logs stored in ring buffer and system table are going to be complete request 
data as opposed to trimmed req data logged by RpcServer. (e.g 

"param":"region { type: REGION_NAME value: 
\"t1,\\000\\000\\215\\f)o\\\\\\024\\302\\220\\000\\000\\000\\000\\000\\001\\000\\000\\000\\000\\000\\006\\000\\000\\000\\000\\000\\005\\000
000<TRUNCATED>"

 )

 
{quote}They don’t include environmental or other details beyond the details of 
the request that is too slow. Yet we find value in them now. Adding such detail 
might be possible (some kind of derived load indicator? Like Unix load?) and 
could be pursued in addition to the present goals.
{quote}
I agree, this might be useful as we are planning to persist actual request data 
in system table.

 

There is one implementation difference I have considered in the PR:

While RingBuffer will get the data filled in asynchronously using LMax 
Disruptor as soon as RpcServer identifies a particular RPC call as slow/large 
in nature, writing the same request immediately in the system table might not 
be a preferred option because system is already most likely slow. Hence, in the 
latest patch, I have considered having a cron running every 10 min and persist 
slow/large logs preserved in memory so far: (list of 100 puts in one go). Puts 
with SKIP_WAL. While it might increase the load on the system momentarily, cron 
will run every 10 min (not continuously as and when we get slow log). Please 
let me know what you think as per your convenience: 
[https://github.com/apache/hbase/pull/1681]


was (Author: vjasani):
Hello Stack/Andrew,

As Andrew said, yes the main purpose behind having to store 
responseTooSlow/responseTooLarge RPC logs optionally in HDFS (System table) on 
top of ringbuffer is to persist them forever. While I understand not all users 
might want to store them permanently, having this option might be useful to 
provide historical system performance with actual RPC data. Moreover, these 
logs stored in ring buffer and system table are going to be complete request 
data as opposed to trimmed req data logged by RpcServer. (e.g 

"param":"region { type: REGION_NAME value: 
\"t1,\\000\\000\\215\\f)o\\\\\\024\\302\\220\\000\\000\\000\\000\\000\\001\\000\\000\\000\\000\\000\\006\\000\\000\\000\\000\\000\\005\\000\\000<TRUNCATED>"

 )

 
{quote}They don’t include environmental or other details beyond the details of 
the request that is too slow. Yet we find value in them now. Adding such detail 
might be possible (some kind of derived load indicator? Like Unix load?) and 
could be pursued in addition to the present goals.
{quote}
I agree, this might be useful as we are planning to persist actual request data 
in system table.

 

There is one implementation difference I have considered in the PR:

While RingBuffer will get the data filled in asynchronously using LMax 
Disruptor as soon as RpcServer identifies a particular RPC call as slow/large 
in nature, writing the same request immediately in the system table might not 
be a preferred option because system is already most likely slow. Hence, in the 
latest patch, I have considered having a cron running every 10 min and persist 
slow/large logs preserved in memory so far: (list of 100 puts in one go). While 
it might increase the load on the system momentarily, cron will run every 10 
min (not continuously as and when we get slow log). Please let me know what you 
think as per your convenience: [https://github.com/apache/hbase/pull/1681]

> Replicate slow/large RPC calls to HDFS
> --------------------------------------
>
>                 Key: HBASE-23938
>                 URL: https://issues.apache.org/jira/browse/HBASE-23938
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0
>
>         Attachments: Screen Shot 2020-05-07 at 12.01.26 AM.png
>
>
> We should provide capability to replicate complete slow and large RPC logs to 
> HDFS or create new system table in addition to Ring Buffer. This way we don't 
> lose any of slow logs and operator can retrieve all the slow/large logs. 
> Replicating logs to HDFS / creating new system table should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-23938) Replicate slow/large RPC calls to HDFS

Reply via email to