[ 
https://issues.apache.org/jira/browse/HDDS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyush Bhatt updated HDDS-8198:
---------------------------------
    Affects Version/s: 1.4.0
                           (was: 1.3.0)

>  Hbase on Ozone has very high latency while running 
> IntegrationTestIngestWithTags
> ---------------------------------------------------------------------------------
>
>                 Key: HDDS-8198
>                 URL: https://issues.apache.org/jira/browse/HDDS-8198
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Filesystem
>    Affects Versions: 1.4.0
>            Reporter: Pratyush Bhatt
>            Priority: Major
>
> After running:
> {noformat}
> hbase  org.apache.hadoop.hbase.IntegrationTestsDriver -regex 
> IntegrationTestIngestWithTags{noformat}
> I compared the outputs across the run that was made on Hbase running over 
> HDFS vs that running on Ozone. For HDFS logs:
> {noformat}
> 2023-03-11 16:24:58,956|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:58 INFO 
> client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
> 2023-03-11 16:24:58,957|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:58 INFO 
> zookeeper.Login: TGT valid starting at:        Sat Mar 11 16:23:26 UTC 2023
> 2023-03-11 16:24:58,957|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:58 INFO 
> zookeeper.Login: TGT expires:                  Mon Mar 13 16:23:26 UTC 2023
> 2023-03-11 16:24:58,957|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:58 INFO 
> zookeeper.Login: TGT refresh sleeping until: Mon Mar 13 07:03:43 UTC 2023
> 2023-03-11 16:24:58,958|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:58 INFO 
> zookeeper.ClientCnxn: Opening socket connection to server 
> quasar-ocvgih-1.quasar-ocvgih.root.hwx.site/172.27.186.3:2181. Will attempt 
> to SASL-authenticate using Login Context section 'Client'
> 2023-03-11 16:24:58,958|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:58 INFO 
> zookeeper.ClientCnxn: Socket connection established, initiating session, 
> client: /10.104.19.14:49614, server: 
> quasar-ocvgih-1.quasar-ocvgih.root.hwx.site/172.27.186.3:2181
> 2023-03-11 16:24:58,962|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:58 INFO 
> zookeeper.ClientCnxn: Session establishment complete on server 
> quasar-ocvgih-1.quasar-ocvgih.root.hwx.site/172.27.186.3:2181, sessionid = 
> 0x10de32e07430056, negotiated timeout = 30000
> 2023-03-11 16:24:59,018|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:59 WARN 
> zookeeper.Login: TGT renewal thread has been interrupted and will exit.
> 2023-03-11 16:24:59,019|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:24:59 INFO 
> zookeeper.ClientCnxn: EventThread shut down for session: 0x313889884620057
> 2023-03-11 16:24:59,036|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|Starting to write data...
> 2023-03-11 16:25:04,045|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:04 INFO 
> util.MultiThreadedAction: [W:10] Keys=901, cols=10.2 K, time=00:00:05 
> Overall: [keys/s= 180, latency=55.01 ms] Current: [keys/s=180, latency=55.01 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:09,046|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:09 INFO 
> util.MultiThreadedAction: [W:10] Keys=1237, cols=14.0 K, time=00:00:10 
> Overall: [keys/s= 123, latency=78.99 ms] Current: [keys/s=67, latency=143.29 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:14,047|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:14 INFO 
> util.MultiThreadedAction: [W:10] Keys=1454, cols=16.5 K, time=00:00:15 
> Overall: [keys/s= 96, latency=101.68 ms] Current: [keys/s=43, latency=231.03 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:19,048|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:19 INFO 
> util.MultiThreadedAction: [W:10] Keys=1653, cols=18.5 K, time=00:00:20 
> Overall: [keys/s= 82, latency=118.79 ms] Current: [keys/s=39, latency=243.82 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:24,049|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:24 INFO 
> util.MultiThreadedAction: [W:10] Keys=1896, cols=21.3 K, time=00:00:25 
> Overall: [keys/s= 75, latency=131.58 ms] Current: [keys/s=48, latency=218.52 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:29,051|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:29 INFO 
> util.MultiThreadedAction: [W:10] Keys=2179, cols=24.5 K, time=00:00:30 
> Overall: [keys/s= 72, latency=137.33 ms] Current: [keys/s=56, latency=175.90 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:34,051|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:34 INFO 
> util.MultiThreadedAction: [W:10] Keys=2434, cols=27.5 K, time=00:00:35 
> Overall: [keys/s= 69, latency=143.49 ms] Current: [keys/s=51, latency=196.14 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:39,053|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:39 INFO 
> util.MultiThreadedAction: [W:10] Keys=3065, cols=34.5 K, time=00:00:40 
> Overall: [keys/s= 76, latency=130.25 ms] Current: [keys/s=126, latency=79.17 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:44,054|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:44 INFO 
> util.MultiThreadedAction: [W:10] Keys=5199, cols=58.3 K, time=00:00:45 
> Overall: [keys/s= 115, latency=86.31 ms] Current: [keys/s=426, latency=23.21 
> ms], wroteUpTo=-1
> 2023-03-11 16:25:49,057|INFO|MainThread|machine.py:203 - 
> run()||GUID=55242eda-1b8c-4dec-a442-89cdd4248fba|23/03/11 16:25:49 INFO 
> util.MultiThreadedAction: [W:10] Keys=6690, cols=74.7 K, time=00:00:50 
> Overall: [keys/s= 133, latency=74.51 ms] Current: [keys/s=298, latency=33.36 
> ms], wroteUpTo=-1{noformat}
> For Ozone:
> {noformat}
> 2023-03-14 15:42:00,520|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:00 INFO 
> zookeeper.ClientCnxn: Socket connection established, initiating session, 
> client: /10.104.12.252:50432, server: 
> hb2-cf4-2.hb2-cf4.root.hwx.site/172.27.179.203:2181
> 2023-03-14 15:42:00,531|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:00 INFO 
> zookeeper.ClientCnxn: Session establishment complete on server 
> hb2-cf4-2.hb2-cf4.root.hwx.site/172.27.179.203:2181, sessionid = 
> 0x105522ff0b80472, negotiated timeout = 30000
> 2023-03-14 15:42:00,565|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:00 WARN 
> zookeeper.Login: TGT renewal thread has been interrupted and will exit.
> 2023-03-14 15:42:00,566|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:00 INFO 
> zookeeper.ClientCnxn: EventThread shut down for session: 0x30dcc8820580470
> 2023-03-14 15:42:00,581|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|Starting to write data...
> 2023-03-14 15:42:05,588|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:05 INFO 
> util.MultiThreadedAction: [W:10] Keys=171, cols=2.0 K, time=00:00:05 Overall: 
> [keys/s= 34, latency=290.19 ms] Current: [keys/s=34, latency=290.19 ms], 
> wroteUpTo=-1
> 2023-03-14 15:42:10,590|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:10 INFO 
> util.MultiThreadedAction: [W:10] Keys=289, cols=3.4 K, time=00:00:10 Overall: 
> [keys/s= 28, latency=340.33 ms] Current: [keys/s=23, latency=412.97 ms], 
> wroteUpTo=-1
> 2023-03-14 15:42:15,591|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:15 INFO 
> util.MultiThreadedAction: [W:10] Keys=405, cols=4.7 K, time=00:00:15 Overall: 
> [keys/s= 26, latency=369.41 ms] Current: [keys/s=23, latency=441.88 ms], 
> wroteUpTo=-1
> 2023-03-14 15:42:20,592|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:20 INFO 
> util.MultiThreadedAction: [W:10] Keys=484, cols=5.7 K, time=00:00:20 Overall: 
> [keys/s= 24, latency=412.15 ms] Current: [keys/s=15, latency=631.24 ms], 
> wroteUpTo=-1
> 2023-03-14 15:42:25,593|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:25 INFO 
> util.MultiThreadedAction: [W:10] Keys=554, cols=6.4 K, time=00:00:25 Overall: 
> [keys/s= 22, latency=450.06 ms] Current: [keys/s=14, latency=712.21 ms], 
> wroteUpTo=-1
> 2023-03-14 15:42:30,594|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:30 INFO 
> util.MultiThreadedAction: [W:10] Keys=634, cols=7.4 K, time=00:00:30 Overall: 
> [keys/s= 21, latency=438.61 ms] Current: [keys/s=16, latency=359.33 ms], 
> wroteUpTo=-1
> 2023-03-14 15:42:35,595|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:35 INFO 
> util.MultiThreadedAction: [W:10] Keys=701, cols=8.1 K, time=00:00:35 Overall: 
> [keys/s= 20, latency=498.65 ms] Current: [keys/s=13, latency=1066.76 ms], 
> wroteUpTo=-1
> 2023-03-14 15:42:40,596|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|23/03/14 15:42:40 INFO 
> util.MultiThreadedAction: [W:10] Keys=727, cols=8.4 K, time=00:00:40 Overall: 
> [keys/s= 18, latency=524.21 ms] Current: [keys/s=5, latency=1213.23 ms], 
> wroteUpTo=-1{noformat}
> It can be clearly seen that latency while running on Ozone is very high 
> compared to that of HDFS.
> Also, during this command execution, there was one exception coming 
> repeatedly:
> {noformat}
> 2023-03-14 16:35:11,513|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|
> 2023-03-14 16:35:11,514|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:142)
> 2023-03-14 16:35:11,514|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.client.HTable.put(HTable.java:514)
> 2023-03-14 16:35:11,514|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.insert(MultiThreadedWriter.java:144)
> 2023-03-14 16:35:11,515|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.run(MultiThreadedWriter.java:108)
> 2023-03-14 16:35:11,515|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|Caused by: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=hb2-cf4-5.hb2-cf4.root.hwx.site/172.27.129.134:22101 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=178406,methodName=Mutate], waitTime=60006ms, rpcTimeout=60000ms
> 2023-03-14 16:35:11,516|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:223)
> 2023-03-14 16:35:11,516|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384)
> 2023-03-14 16:35:11,516|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:88)
> 2023-03-14 16:35:11,517|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415)
> 2023-03-14 16:35:11,517|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411)
> 2023-03-14 16:35:11,517|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:107)
> 2023-03-14 16:35:11,518|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> 2023-03-14 16:35:11,518|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
> 2023-03-14 16:35:11,518|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
> 2023-03-14 16:35:11,519|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
> 2023-03-14 16:35:11,519|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
> 2023-03-14 16:35:11,520|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
> 2023-03-14 16:35:11,520|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> java.lang.Thread.run(Thread.java:750)
> 2023-03-14 16:35:11,520|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|Caused by: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=178406,methodName=Mutate], waitTime=60006ms, rpcTimeout=60000ms
> 2023-03-14 16:35:11,521|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> 2023-03-14 16:35:11,521|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|... 6 more
> 2023-03-14 16:35:11,521|INFO|MainThread|machine.py:203 - 
> run()||GUID=4ab065b8-f676-4cbc-b2b6-0814342c920f|{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to