Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-815. Improve performance of first scan following restart ......................................................................
KUDU-815. Improve performance of first scan following restart On the first scan following a tablet server restart, the TS has not read deltafile stats for any delta files. This means that, when we construct DeltaFileIterators to service a scan, we don't yet know whether the files are even relevant given the MVCC snapshot that is being scanned. Previous to this patch, we only attempted to cull irrelevant DeltaFiles at iterator construction time, and without stats, we were unable to do so. With this patch, we check again when the iterator is seeked, and in the case that the file is irrelevant, we preemptively mark the file as "exhausted" which prevents any needless IO. To benchmark, I loaded a 1GB TPCH lineitem on a local tserver and looked at the performance of the first scan. without patch: todd@todd-ThinkPad-T540p:~/git/kudu$ ./build/release/bin/tpch_real_world --tpch_load_data=0 --tpch_use_mini_cluster=0 I0505 16:15:28.855382 32209 tpch_real_world.cc:307] Time spent querying data in cluster: real 1.966s user 0.112s sys 0.000s I0505 16:15:29.598799 32209 tpch_real_world.cc:307] Time spent querying data in cluster: real 0.743s user 0.100s sys 0.000s with patch: todd@todd-ThinkPad-T540p:~/git/kudu$ ./build/release/bin/tpch_real_world --tpch_load_data=0 --tpch_use_mini_cluster=0 I0505 16:14:31.102988 31545 tpch_real_world.cc:307] Time spent querying data in cluster: real 0.924s user 0.096s sys 0.008s There is still a slight performance difference between the first scan after a restart and the second due to cold caches, but the difference is much less dramatic. Change-Id: Icd01302723430e5b06308256bbbbb790aee096fc Reviewed-on: http://gerrit.cloudera.org:8080/2974 Tested-by: Kudu Jenkins Reviewed-by: Jean-Daniel Cryans --- M src/kudu/tablet/deltafile.cc 1 file changed, 12 insertions(+), 1 deletion(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/2974 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Icd01302723430e5b06308256bbbbb790aee096fc Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]>
