[
https://issues.apache.org/jira/browse/HBASE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106518#comment-17106518
]
Viraj Jasani edited comment on HBASE-23938 at 5/13/20, 5:49 PM:
----------------------------------------------------------------
Hello Stack/Andrew,
As Andrew said, yes the main purpose behind having to store
responseTooSlow/responseTooLarge RPC logs optionally in HDFS (System table) on
top of ringbuffer is to persist them forever. While I understand not all users
might want to store them permanently, having this option might be useful to
provide historical system performance with actual RPC data. Moreover, these
logs stored in ring buffer and system table are going to be complete request
data as opposed to trimmed req data logged by RpcServer. (e.g
"param":"region { type: REGION_NAME value:
\"t1,\\000\\000\\215\\f)o\\\\\\024\\302\\220\\000\\000\\000\\000\\000\\001\\000\\000\\000\\000\\000\\006\\000\\000\\000\\000\\000\\005\\000
000<TRUNCATED>"
)
{quote}They don’t include environmental or other details beyond the details of
the request that is too slow. Yet we find value in them now. Adding such detail
might be possible (some kind of derived load indicator? Like Unix load?) and
could be pursued in addition to the present goals.
{quote}
I agree, this might be useful as we are planning to persist actual request data
in system table.
There is one implementation difference I have considered in the PR:
While RingBuffer will get the data filled in asynchronously using LMax
Disruptor as soon as RpcServer identifies a particular RPC call as slow/large
in nature, writing the same request immediately in the system table might not
be a preferred option because system is already most likely slow. Hence, in the
latest patch, I have considered having a cron running every 10 min and persist
slow/large logs preserved in memory so far: (list of 100 puts in one go). Puts
with SKIP_WAL. While it might increase the load on the system momentarily, cron
will run every 10 min (not continuously as and when we get slow log). Please
let me know what you think as per your convenience:
[https://github.com/apache/hbase/pull/1681]
was (Author: vjasani):
Hello Stack/Andrew,
As Andrew said, yes the main purpose behind having to store
responseTooSlow/responseTooLarge RPC logs optionally in HDFS (System table) on
top of ringbuffer is to persist them forever. While I understand not all users
might want to store them permanently, having this option might be useful to
provide historical system performance with actual RPC data. Moreover, these
logs stored in ring buffer and system table are going to be complete request
data as opposed to trimmed req data logged by RpcServer. (e.g
"param":"region { type: REGION_NAME value:
\"t1,\\000\\000\\215\\f)o\\\\\\024\\302\\220\\000\\000\\000\\000\\000\\001\\000\\000\\000\\000\\000\\006\\000\\000\\000\\000\\000\\005\\000\\000<TRUNCATED>"
)
{quote}They don’t include environmental or other details beyond the details of
the request that is too slow. Yet we find value in them now. Adding such detail
might be possible (some kind of derived load indicator? Like Unix load?) and
could be pursued in addition to the present goals.
{quote}
I agree, this might be useful as we are planning to persist actual request data
in system table.
There is one implementation difference I have considered in the PR:
While RingBuffer will get the data filled in asynchronously using LMax
Disruptor as soon as RpcServer identifies a particular RPC call as slow/large
in nature, writing the same request immediately in the system table might not
be a preferred option because system is already most likely slow. Hence, in the
latest patch, I have considered having a cron running every 10 min and persist
slow/large logs preserved in memory so far: (list of 100 puts in one go). While
it might increase the load on the system momentarily, cron will run every 10
min (not continuously as and when we get slow log). Please let me know what you
think as per your convenience: [https://github.com/apache/hbase/pull/1681]
> Replicate slow/large RPC calls to HDFS
> --------------------------------------
>
> Key: HBASE-23938
> URL: https://issues.apache.org/jira/browse/HBASE-23938
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
> Attachments: Screen Shot 2020-05-07 at 12.01.26 AM.png
>
>
> We should provide capability to replicate complete slow and large RPC logs to
> HDFS or create new system table in addition to Ring Buffer. This way we don't
> lose any of slow logs and operator can retrieve all the slow/large logs.
> Replicating logs to HDFS / creating new system table should be configurable.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)