anoopsjohn commented on a change in pull request #1747: URL: https://github.com/apache/hbase/pull/1747#discussion_r429177005
########## File path: src/main/asciidoc/_chapters/slow_log_responses_from_systable.adoc ########## @@ -0,0 +1,116 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[slow_log_responses_from_systable]] +==== Get Slow/Large Response Logs from System table hbase:slowlog + +The above section provides details about Admin APIs: + +* get_slowlog_responses +* get_largelog_responses +* clear_slowlog_responses + +All of the above APIs access online in-memory ring buffers from +individual RegionServers and accumulate logs from ring buffers to display +to end user. However, they can only be used for an ongoing slow/large RPC calls +and user can retrieve ongoing issues with RPC logs. After RegionServer is +restarted, all the objects held in memory of that RegionServer will be cleaned up +and previous problematic logs are lost. What if we want to persist all these logs? +What if we want to store them in such a manner that operator can get all historical +records with some filters? e.g get me all large/slow RPC logs that are triggered by +user1 and are related to region: +cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. + +If we have a system table that stores such logs in increasing (not so strictly though) +order of time, it can definitely help operators debug some historical events +(scan, get, put, compaction, flush etc) with detailed inputs. + +Config which enabled system table to be created and store all log events is +`hbase.regionserver.slowlog.systable.enabled`. + +The default value for this config is `false`. If provided `true` +(Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`), +HMaster will create system table `hbase:slowlog` and each RegionServer, upon +determining any slow/large RPC logs at RpcServer level, will put them in in-memory +ring buffer and system table `hbase:slowlog`. + +hbase:slowlog has single ColumnFamily: `info` +`info` contains multiple qualifiers which are the same attributes present as +part of `get_slowlog_responses` API response. + +* info:call_details +* info:client_address +* info:method_name +* info:param +* info:processing_time +* info:queue_time +* info:region_name +* info:response_size +* info:server_class +* info:start_time +* info:type +* info:username + +And example of 2 rows from hbase:slowlog scan result: +[source] +---- + + \x024\xC1\x03\xE9\x04\xF5@ column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) + \x024\xC1\x03\xE9\x04\xF5@ column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347 + \x024\xC1\x03\xE9\x04\xF5@ column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan + \x024\xC1\x03\xE9\x04\xF5@ column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle + vel_" value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks + : true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_ + handles_heartbeats: true track_scan_metrics: false + \x024\xC1\x03\xE9\x04\xF5@ column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18 + \x024\xC1\x03\xE9\x04\xF5@ column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0 + \x024\xC1\x03\xE9\x04\xF5@ column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1 + \x024\xC1\x03\xE9\x04\xF5@ column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575 + \x024\xC1\x03\xE9\x04\xF5@ column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer + \x024\xC1\x03\xE9\x04\xF5@ column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732 + \x024\xC1\x03\xE9\x04\xF5@ column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL + \x024\xC1\x03\xE9\x04\xF5@ column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2 + \x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) + \x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348 + \x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan + \x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a + ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2 + 097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met + rics: false + \x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24 + \x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0 + \x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. + \x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227 + \x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer + \x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932 + \x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL + \x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=user1 +---- + +Operator can use ColumnValueFilter to filter records based on region_name, username, +client_address etc. + +Time range based queries would also be used quite frequently. Review comment: what to say "Time range based queries will be also be very useful" instead? ########## File path: src/main/asciidoc/_chapters/slow_log_responses_from_systable.adoc ########## @@ -0,0 +1,116 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[slow_log_responses_from_systable]] +==== Get Slow/Large Response Logs from System table hbase:slowlog + +The above section provides details about Admin APIs: + +* get_slowlog_responses +* get_largelog_responses +* clear_slowlog_responses + +All of the above APIs access online in-memory ring buffers from +individual RegionServers and accumulate logs from ring buffers to display +to end user. However, they can only be used for an ongoing slow/large RPC calls +and user can retrieve ongoing issues with RPC logs. After RegionServer is +restarted, all the objects held in memory of that RegionServer will be cleaned up +and previous problematic logs are lost. What if we want to persist all these logs? +What if we want to store them in such a manner that operator can get all historical +records with some filters? e.g get me all large/slow RPC logs that are triggered by +user1 and are related to region: +cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. + +If we have a system table that stores such logs in increasing (not so strictly though) +order of time, it can definitely help operators debug some historical events +(scan, get, put, compaction, flush etc) with detailed inputs. + +Config which enabled system table to be created and store all log events is +`hbase.regionserver.slowlog.systable.enabled`. + +The default value for this config is `false`. If provided `true` +(Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`), +HMaster will create system table `hbase:slowlog` and each RegionServer, upon Review comment: I dont think we need to explain any impl detail here like who create the table. We can say a cron job running in every RS will persist the slow/large logs into this table. Also explain what is the default cron period and how to config that. Also are we not saying about the buffer size configs? ########## File path: src/main/asciidoc/_chapters/slow_log_responses_from_systable.adoc ########## @@ -0,0 +1,116 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[slow_log_responses_from_systable]] +==== Get Slow/Large Response Logs from System table hbase:slowlog + +The above section provides details about Admin APIs: + +* get_slowlog_responses +* get_largelog_responses +* clear_slowlog_responses + +All of the above APIs access online in-memory ring buffers from +individual RegionServers and accumulate logs from ring buffers to display +to end user. However, they can only be used for an ongoing slow/large RPC calls +and user can retrieve ongoing issues with RPC logs. After RegionServer is Review comment: BTW these logs in ring buffer will have any timestamp ? (When the slow/large RPC happened?) ########## File path: src/main/asciidoc/_chapters/slow_log_responses_from_systable.adoc ########## @@ -0,0 +1,116 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[slow_log_responses_from_systable]] +==== Get Slow/Large Response Logs from System table hbase:slowlog + +The above section provides details about Admin APIs: + +* get_slowlog_responses +* get_largelog_responses +* clear_slowlog_responses + +All of the above APIs access online in-memory ring buffers from +individual RegionServers and accumulate logs from ring buffers to display +to end user. However, they can only be used for an ongoing slow/large RPC calls Review comment: ongoing slow RPC? Little bit confusing statement IMO. ########## File path: src/main/asciidoc/_chapters/slow_log_responses_from_systable.adoc ########## @@ -0,0 +1,116 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[slow_log_responses_from_systable]] +==== Get Slow/Large Response Logs from System table hbase:slowlog + +The above section provides details about Admin APIs: + +* get_slowlog_responses +* get_largelog_responses +* clear_slowlog_responses + +All of the above APIs access online in-memory ring buffers from Review comment: Above are shell commands? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
