[
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259189#comment-15259189
]
Dechang Gu edited comment on DRILL-4589 at 4/27/16 2:51 AM:
------------------------------------------------------------
Verified on ucs cluster with the test case. Overall, we see average 2x
improvement in explain plan time, some got 5x improvement, comparing with the
fix and without the fix:
10-node cluster MapR 4.0 (1 master, 10 data nodes) - 32 cores, 256G RAM, 10
disks, 208K parquet files (each 12KB) in 3-level directory structure
Query Without 4589 Apache Drill 1.7.0 master GitId 9514cbe Exec Time (ms)
With 4589 Apache Drill 1.7.0 master GitId 9f4fff8 Exec
Time (ms) with 4589 vs w/o 4589
Run 1 Run 2 Run 3 Avg Query Time Run 1 Run 2 Run 3 Avg
Query Time diff in avg wo4589/with4589 (avg) wo4589/with4589 (best)
DRILL4589_EXPPLAN_01 21478 15491 14784 17251 8634 8431 9673
8913 8338 1.94 1.75
DRILL4589_EXPPLAN_02 19168 15560 15168 16632 10391 10665 8343
9800 6832 1.70 1.82
DRILL4589_EXPPLAN_03 15478 13606 14506 14530 9323 8412 9520
9085 5445 1.60 1.62
DRILL4589_EXPPLAN_04 18792 15311 14197 16100 8562 8525 7720
8269 7831 1.95 1.84
DRILL4589_EXPPLAN_05 18447 14852 14692 15997 9333 8600 7874
8602 7395 1.86 1.87
DRILL4589_EXPPLAN_06 18249 14619 15113 15994 9440 8133 9474
9016 6978 1.77 1.80
DRILL4589_EXPPLAN_07 17213 15377 14132 15574 8196 7850 8066
8037 7537 1.94 1.80
DRILL4589_EXPPLAN_08 15884 13808 16767 15486 8805 8212 7978
8332 7155 1.86 1.73
DRILL4589_EXPPLAN_09 14810 15947 14151 14969 8612 8471 8847
8643 6326 1.73 1.67
DRILL4589_EXPPLAN_10 15995 15373 16091 15820 9541 8879 8203
8874 6945 1.78 1.87
DRILL4589_EXPPLAN_11 18722 18239 18828 18596 9677 8040 7883
8533 10063 2.18 2.31
DRILL4589_EXPPLAN_12 16725 16246 16772 16581 8442 7888 8285
8205 8376 2.02 2.06
DRILL4589_EXPPLAN_13 17063 13647 15686 15465 9284 8050 9015
8783 6682 1.76 1.70
DRILL4589_EXPPLAN_14 14831 15107 14873 14937 8954 9336 8944
9078 5859 1.65 1.66
DRILL4589_EXPPLAN_15 15170 15548 15166 15295 8897 8739 8891
8842 6452 1.73 1.74
DRILL4589_EXPPLAN_16 44969 41579 41880 42809 10238 9777 9029
9681 33128 4.42 4.61
DRILL4589_EXPPLAN_17 43389 42351 41345 42362 9315 7932 8323
8523 33838 4.97 5.21
DRILL4589_EXPPLAN_18 45816 44939 44439 45065 10833 9132 7802
9256 35809 4.87 5.70
DRILL4589_EXPPLAN_19 41115 39407 40501 40341 8790 9261 9132
9061 31280 4.45 4.48
DRILL4589_EXPPLAN_20 14270 13685 14713 14223 8991 8264 8939
8731 5491 1.63 1.66
DRILL4589_EXPPLAN_21 14638 14244 14703 14528 9599 8384 8482
8822 5707 1.65 1.70
DRILL4589_EXPPLAN_22 14441 14778 14552 14590 7874 9421 7651
8315 6275 1.75 1.89
DRILL4589_EXPPLAN_23 16534 14558 15656 15583 8286 8013 8314
8204 7378 1.90 1.82
DRILL4589_EXPPLAN_24 14663 13397 15060 14373 9738 8856 9014
9203 5171 1.56 1.51
DRILL4589_EXPPLAN_25 14533 14936 14067 14512 7814 7797 8821
8144 6368 1.78 1.80
DRILL4589_EXPPLAN_26 14070 13637 14026 13911 9066 9258 8904
9076 4835 1.53 1.53
DRILL4589_EXPPLAN_27 13964 13669 13937 13857 8796 7871 8276
8314 5542 1.67 1.74
DRILL4589_EXPPLAN_28 15595 14505 14965 15022 8014 8060 8028
8034 6988 1.87 1.81
DRILL4589_EXPPLAN_29 12815 13841 13309 13322 9309 8000 8976
8762 4560 1.52 1.60
DRILL4589_EXPPLAN_30 13907 14984 14339 14410 8291 8882 8425
8533 5877 1.69 1.68
DRILL4589_EXPPLAN_31 14534 14309 14400 14414 9440 8134 8880
8818 5596 1.63 1.76
DRILL4589_EXPPLAN_32 13779 13716 14620 14038 8992 8380 9079
8817 5221 1.59 1.64
DRILL4589_EXPPLAN_33 15068 13562 14986 14539 10477 8727 8403
9202 5336 1.58 1.61
DRILL4589_EXPPLAN_34 15953 13921 15597 15157 9240 8431 8466
8712 6445 1.74 1.65
DRILL4589_EXPPLAN_35 13884 14096 13639 13873 8868 8067 10034
8990 4883 1.54 1.69
DRILL4589_EXPPLAN_36 14925 14983 15108 15005 10120 9130 10156
9802 5203 1.53 1.63
DRILL4589_EXPPLAN_37 16367 16188 14462 15672 8679 12305 8501
9828 5844 1.59 1.70
DRILL4589_EXPPLAN_38 13555 15765 14353 14558 8006 7894 9846
8582 5976 1.70 1.72
DRILL4589_EXPPLAN_39 15467 14278 14594 14780 8580 8497 7747
8275 6505 1.79 1.84
DRILL4589_EXPPLAN_40 14599 14651 13417 14222 9725 8096 8401
8741 5482 1.63 1.66
DRILL4589_EXPPLAN_41 14102 14047 14380 14176 9353 8611 8449
8804 5372 1.61 1.66
DRILL4589_EXPPLAN_42 14723 15812 15369 15301 8744 8254 8905
8634 6667 1.77 1.78
DRILL4589_EXPPLAN_43 16617 13929 13988 14845 9490 8214 9199
8968 5877 1.66 1.70
DRILL4589_EXPPLAN_44 15228 14422 14718 14789 7968 9128 9051
8716 6074 1.70 1.81
DRILL4589_EXPPLAN_45 14226 13860 13989 14025 9352 8056 8658
8689 5336 1.61 1.72
DRILL4589_EXPPLAN_46 15745 14040 15577 15121 10605 8496 10928
10010 5111 1.51 1.65
DRILL4589_EXPPLAN_47 14498 15349 14719 14855 9061 8112 8851
8675 6181 1.71 1.79
DRILL4589_EXPPLAN_48 13620 13168 14035 13608 9116 7985 9338
8813 4795 1.54 1.65
DRILL4589_EXPPLAN_49 14526 14140 14349 14338 8312 8709 8569
8530 5808 1.68 1.70
DRILL4589_EXPPLAN_50 14862 13840 15082 14595 8199 9920 9202
9107 5488 1.60 1.69
DRILL4589_EXPPLAN_51 14117 13843 14490 14150 8579 9292 7921
8597 5553 1.65 1.75
DRILL4589_EXPPLAN_52 13439 14472 13648 13853 10010 8455 8198
8888 4965 1.56 1.64
DRILL4589_EXPPLAN_53 14607 14065 15503 14725 8229 9200 8045
8491 6234 1.73 1.75
DRILL4589_EXPPLAN_54 13846 13635 14279 13920 8730 7973 7942
8215 5705 1.69 1.72
DRILL4589_EXPPLAN_55 14188 13925 15068 14394 9414 8418 8273
8702 5692 1.65 1.68
DRILL4589_EXPPLAN_56 13824 13702 13769 13765 8292 9298 8347
8646 5119 1.59 1.65
DRILL4589_EXPPLAN_57 15920 14389 13950 14753 7814 7715 8139
7889 6864 1.87 1.81
DRILL4589_EXPPLAN_58 13842 15008 13655 14168 9954 9721 8778
9484 4684 1.49 1.56
DRILL4589_EXPPLAN_59 15188 14294 14105 14529 7959 7962 8413
8111 6418 1.79 1.77
DRILL4589_EXPPLAN_60 13737 14767 14457 14320 9791 8845 9162
9266 5054 1.55 1.55
DRILL4589_EXPPLAN_61 13304 13994 14180 13826 8613 8315 7760
8229 5597 1.68 1.71
DRILL4589_EXPPLAN_62 13634 14998 14573 14402 9473 9459
9466 4936 1.52 1.44
was (Author: dechanggu):
Verified on ucs cluster with the test case. Overall, we see average 2x
improvement in explain plan time, some got 5x improvement, comparing with the
fix and without the fix:
https://docs.google.com/spreadsheets/d/1dh7w8yvQ4fHTt0Bcb9xROqeXU80REjBoPZNSyff49Bg/edit#gid=0
UCS 10-node cluster MapR 4.0 (1 master, 10 data nodes) - 32 cores, 256G RAM, 10
disks, 208K parquet files (each 12KB) in 3-level directory structure
Query Without 4589 Apache Drill 1.7.0 master GitId 9514cbe Exec Time (ms)
With 4589 Apache Drill 1.7.0 master GitId 9f4fff8 Exec
Time (ms) with 4589 vs w/o 4589
Run 1 Run 2 Run 3 Avg Query Time Run 1 Run 2 Run 3 Avg
Query Time diff in avg wo4589/with4589 (avg) wo4589/with4589 (best)
DRILL4589_EXPPLAN_01 21478 15491 14784 17251 8634 8431 9673
8913 8338 1.94 1.75
DRILL4589_EXPPLAN_02 19168 15560 15168 16632 10391 10665 8343
9800 6832 1.70 1.82
DRILL4589_EXPPLAN_03 15478 13606 14506 14530 9323 8412 9520
9085 5445 1.60 1.62
DRILL4589_EXPPLAN_04 18792 15311 14197 16100 8562 8525 7720
8269 7831 1.95 1.84
DRILL4589_EXPPLAN_05 18447 14852 14692 15997 9333 8600 7874
8602 7395 1.86 1.87
DRILL4589_EXPPLAN_06 18249 14619 15113 15994 9440 8133 9474
9016 6978 1.77 1.80
DRILL4589_EXPPLAN_07 17213 15377 14132 15574 8196 7850 8066
8037 7537 1.94 1.80
DRILL4589_EXPPLAN_08 15884 13808 16767 15486 8805 8212 7978
8332 7155 1.86 1.73
DRILL4589_EXPPLAN_09 14810 15947 14151 14969 8612 8471 8847
8643 6326 1.73 1.67
DRILL4589_EXPPLAN_10 15995 15373 16091 15820 9541 8879 8203
8874 6945 1.78 1.87
DRILL4589_EXPPLAN_11 18722 18239 18828 18596 9677 8040 7883
8533 10063 2.18 2.31
DRILL4589_EXPPLAN_12 16725 16246 16772 16581 8442 7888 8285
8205 8376 2.02 2.06
DRILL4589_EXPPLAN_13 17063 13647 15686 15465 9284 8050 9015
8783 6682 1.76 1.70
DRILL4589_EXPPLAN_14 14831 15107 14873 14937 8954 9336 8944
9078 5859 1.65 1.66
DRILL4589_EXPPLAN_15 15170 15548 15166 15295 8897 8739 8891
8842 6452 1.73 1.74
DRILL4589_EXPPLAN_16 44969 41579 41880 42809 10238 9777 9029
9681 33128 4.42 4.61
DRILL4589_EXPPLAN_17 43389 42351 41345 42362 9315 7932 8323
8523 33838 4.97 5.21
DRILL4589_EXPPLAN_18 45816 44939 44439 45065 10833 9132 7802
9256 35809 4.87 5.70
DRILL4589_EXPPLAN_19 41115 39407 40501 40341 8790 9261 9132
9061 31280 4.45 4.48
DRILL4589_EXPPLAN_20 14270 13685 14713 14223 8991 8264 8939
8731 5491 1.63 1.66
DRILL4589_EXPPLAN_21 14638 14244 14703 14528 9599 8384 8482
8822 5707 1.65 1.70
DRILL4589_EXPPLAN_22 14441 14778 14552 14590 7874 9421 7651
8315 6275 1.75 1.89
DRILL4589_EXPPLAN_23 16534 14558 15656 15583 8286 8013 8314
8204 7378 1.90 1.82
DRILL4589_EXPPLAN_24 14663 13397 15060 14373 9738 8856 9014
9203 5171 1.56 1.51
DRILL4589_EXPPLAN_25 14533 14936 14067 14512 7814 7797 8821
8144 6368 1.78 1.80
DRILL4589_EXPPLAN_26 14070 13637 14026 13911 9066 9258 8904
9076 4835 1.53 1.53
DRILL4589_EXPPLAN_27 13964 13669 13937 13857 8796 7871 8276
8314 5542 1.67 1.74
DRILL4589_EXPPLAN_28 15595 14505 14965 15022 8014 8060 8028
8034 6988 1.87 1.81
DRILL4589_EXPPLAN_29 12815 13841 13309 13322 9309 8000 8976
8762 4560 1.52 1.60
DRILL4589_EXPPLAN_30 13907 14984 14339 14410 8291 8882 8425
8533 5877 1.69 1.68
DRILL4589_EXPPLAN_31 14534 14309 14400 14414 9440 8134 8880
8818 5596 1.63 1.76
DRILL4589_EXPPLAN_32 13779 13716 14620 14038 8992 8380 9079
8817 5221 1.59 1.64
DRILL4589_EXPPLAN_33 15068 13562 14986 14539 10477 8727 8403
9202 5336 1.58 1.61
DRILL4589_EXPPLAN_34 15953 13921 15597 15157 9240 8431 8466
8712 6445 1.74 1.65
DRILL4589_EXPPLAN_35 13884 14096 13639 13873 8868 8067 10034
8990 4883 1.54 1.69
DRILL4589_EXPPLAN_36 14925 14983 15108 15005 10120 9130 10156
9802 5203 1.53 1.63
DRILL4589_EXPPLAN_37 16367 16188 14462 15672 8679 12305 8501
9828 5844 1.59 1.70
DRILL4589_EXPPLAN_38 13555 15765 14353 14558 8006 7894 9846
8582 5976 1.70 1.72
DRILL4589_EXPPLAN_39 15467 14278 14594 14780 8580 8497 7747
8275 6505 1.79 1.84
DRILL4589_EXPPLAN_40 14599 14651 13417 14222 9725 8096 8401
8741 5482 1.63 1.66
DRILL4589_EXPPLAN_41 14102 14047 14380 14176 9353 8611 8449
8804 5372 1.61 1.66
DRILL4589_EXPPLAN_42 14723 15812 15369 15301 8744 8254 8905
8634 6667 1.77 1.78
DRILL4589_EXPPLAN_43 16617 13929 13988 14845 9490 8214 9199
8968 5877 1.66 1.70
DRILL4589_EXPPLAN_44 15228 14422 14718 14789 7968 9128 9051
8716 6074 1.70 1.81
DRILL4589_EXPPLAN_45 14226 13860 13989 14025 9352 8056 8658
8689 5336 1.61 1.72
DRILL4589_EXPPLAN_46 15745 14040 15577 15121 10605 8496 10928
10010 5111 1.51 1.65
DRILL4589_EXPPLAN_47 14498 15349 14719 14855 9061 8112 8851
8675 6181 1.71 1.79
DRILL4589_EXPPLAN_48 13620 13168 14035 13608 9116 7985 9338
8813 4795 1.54 1.65
DRILL4589_EXPPLAN_49 14526 14140 14349 14338 8312 8709 8569
8530 5808 1.68 1.70
DRILL4589_EXPPLAN_50 14862 13840 15082 14595 8199 9920 9202
9107 5488 1.60 1.69
DRILL4589_EXPPLAN_51 14117 13843 14490 14150 8579 9292 7921
8597 5553 1.65 1.75
DRILL4589_EXPPLAN_52 13439 14472 13648 13853 10010 8455 8198
8888 4965 1.56 1.64
DRILL4589_EXPPLAN_53 14607 14065 15503 14725 8229 9200 8045
8491 6234 1.73 1.75
DRILL4589_EXPPLAN_54 13846 13635 14279 13920 8730 7973 7942
8215 5705 1.69 1.72
DRILL4589_EXPPLAN_55 14188 13925 15068 14394 9414 8418 8273
8702 5692 1.65 1.68
DRILL4589_EXPPLAN_56 13824 13702 13769 13765 8292 9298 8347
8646 5119 1.59 1.65
DRILL4589_EXPPLAN_57 15920 14389 13950 14753 7814 7715 8139
7889 6864 1.87 1.81
DRILL4589_EXPPLAN_58 13842 15008 13655 14168 9954 9721 8778
9484 4684 1.49 1.56
DRILL4589_EXPPLAN_59 15188 14294 14105 14529 7959 7962 8413
8111 6418 1.79 1.77
DRILL4589_EXPPLAN_60 13737 14767 14457 14320 9791 8845 9162
9266 5054 1.55 1.55
DRILL4589_EXPPLAN_61 13304 13994 14180 13826 8613 8315 7760
8229 5597 1.68 1.71
DRILL4589_EXPPLAN_62 13634 14998 14573 14402 9473 9459
9466 4936 1.52 1.44
> Reduce planning time for file system partition pruning by reducing filter
> evaluation overhead
> ---------------------------------------------------------------------------------------------
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Fix For: 1.7.0
>
>
> When Drill is used to query hundreds of thousands, or even millions of files
> organized into multi-level directories, user typically will provide a
> partition filter like : dir0 = something and dir1 = something2 and .. .
> For such queries, we saw the query planning time could be unacceptable long,
> due to three main overheads: 1) to expand and get the list of files, 2) to
> evaluate the partition filter, 3) to get the metadata, in the case of parquet
> files for which metadata cache file is not available.
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the
> partition filter evaluation is applied to file level. In many cases, we saw
> that the number of leaf subdirectories is significantly lower than that of
> files. Since all the files under the same leaf subdirecctory share the same
> directory metadata, we should apply the filter evaluation at the leaf
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the
> filter, and the memory overhead as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
