[ 
https://issues.apache.org/jira/browse/PARQUET-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081826#comment-17081826
 ] 

Yuming Wang commented on PARQUET-1739:
--------------------------------------

Spark benchmark result:
|Case|Parquet 1.11 Vectorized(ms)|Parquet 1.11 Vectorized(Pushdown)(ms)|Parquet 
1.10 Vectorized(ms)|Parquet 1.10 Vectorized(Pushdown)(ms)|%Improved|
|Select 0 string row (value IS NULL)|7001|631|8459|569|0.10896309|
|Select 0 string row ('7864320' < value < 
'7864320')|8801|744|9596|470|0.58297872|
|Select 1 string row (value = '7864320')|6973|578|8415|456|0.26754386|
|Select 1 string row (value <=> '7864320')|7090|867|9681|663|0.30769231|
|Select 1 string row ('7864320' <= value <= 
'7864320')|7637|639|8257|442|0.44570136|
|Select all string rows (value IS NOT NULL)|14638|14926|15058|17091|-0.1266749|
|Select 0 int row (value IS NULL)|7233|532|8373|460|0.15652174|
|Select 0 int row (7864320 < value < 7864320)|6474|558|8176|620|-0.1|
|Select 1 int row (value = 7864320)|7284|554|7545|435|0.27356322|
|Select 1 int row (value <=> 7864320)|7109|724|8550|484|0.49586777|
|Select 1 int row (7864320 <= value <= 7864320)|6340|563|7648|440|0.27954545|
|Select 1 int row (7864319 < value < 7864321)|7134|620|7521|435|0.42528736|
|Select 10% int rows (value < 1572864)|7561|1986|8790|1988|-0.001006|
|Select 50% int rows (value < 7864320)|10425|7434|10445|7133|0.04219823|
|Select 90% int rows (value < 14155776)|12130|11745|12959|12574|-0.0659297|
|Select all int rows (value IS NOT NULL)|12662|12961|13640|13794|-0.0603886|
|Select all int rows (value > -1)|12568|12864|13547|13691|-0.0604046|
|Select all int rows (value != -1)|12574|12874|14617|14533|-0.114154|
|Select 0 distinct string row (value IS NULL)|5925|455|7013|371|0.22641509|
|Select 0 distinct string row ('100' < value < 
'100')|6037|445|7087|391|0.13810742|
|Select 1 distinct string row (value = '100')|6107|603|7169|524|0.15076336|
|Select 1 distinct string row (value <=> '100')|6309|1418|7113|528|1.68560606|
|Select 1 distinct string row ('100' <= value <= 
'100')|6224|620|7222|549|0.12932605|
|Select all distinct string rows (value IS NOT 
NULL)|14198|14293|15175|16194|-0.1173892|
|StringStartsWith filter: (value like '10%')|8399|3572|10298|2642|0.35200606|
|StringStartsWith filter: (value like '1000%')|7424|559|7998|441|0.2675737|
|StringStartsWith filter: (value like '786432%')|7554|542|7920|428|0.26635514|
|Select 1 decimal(9, 2) row (value = 7864320)|2684|131|3834|115|0.13913043|
|Select 10% decimal(9, 2) rows (value < 1572864)|4201|2280|5139|2170|0.05069124|
|Select 50% decimal(9, 2) rows (value < 7864320)|8661|8325|9593|10449|-0.203273|
|Select 90% decimal(9, 2) rows (value < 
14155776)|10213|9833|11647|11828|-0.1686676|
|Select 1 decimal(18, 2) row (value = 7864320)|3259|150|4631|133|0.12781955|
|Select 10% decimal(18, 2) rows (value < 
1572864)|4072|1284|5285|1260|0.01904762|
|Select 50% decimal(18, 2) rows (value < 
7864320)|7010|5495|7959|5898|-0.0683282|
|Select 90% decimal(18, 2) rows (value < 
14155776)|10037|9957|10845|10535|-0.0548647|
|Select 1 decimal(38, 2) row (value = 7864320)|4970|151|5943|131|0.15267176|
|Select 10% decimal(38, 2) rows (value < 
1572864)|5912|1605|7079|1827|-0.1215107|
|Select 50% decimal(38, 2) rows (value < 
7864320)|9784|7573|11497|7991|-0.0523088|
|Select 90% decimal(38, 2) rows (value < 
14155776)|13935|13341|14702|14183|-0.0593668|
|InSet -> InFilters (values count: 5, distribution: 
10)|7193|600|8001|495|0.21212121|
|InSet -> InFilters (values count: 5, distribution: 
50)|7002|577|8042|480|0.20208333|
|InSet -> InFilters (values count: 5, distribution: 
90)|7003|587|8526|484|0.21280992|
|InSet -> InFilters (values count: 10, distribution: 
10)|6984|625|8279|519|0.20423892|
|InSet -> InFilters (values count: 10, distribution: 
50)|6949|706|8097|505|0.3980198|
|InSet -> InFilters (values count: 10, distribution: 
90)|7336|613|7961|507|0.20907298|
|InSet -> InFilters (values count: 50, distribution: 
10)|7369|7475|8052|8244|-0.09328|
|InSet -> InFilters (values count: 50, distribution: 
50)|7295|7619|8202|8311|-0.0832631|
|InSet -> InFilters (values count: 50, distribution: 
90)|7584|7610|8405|8326|-0.0859957|
|InSet -> InFilters (values count: 100, distribution: 
10)|7264|7358|8041|8200|-0.1026829|
|InSet -> InFilters (values count: 100, distribution: 
50)|7192|7277|8019|8437|-0.1374896|
|InSet -> InFilters (values count: 100, distribution: 
90)|7040|7236|10567|10681|-0.3225353|
|Select 1 tinyint row (value = CAST(63 AS 
tinyint))|3185|247|4855|235|0.05106383|
|Select 10% tinyint rows (value < CAST(12 AS 
tinyint))|3823|1120|5091|1209|-0.0736146|
|Select 50% tinyint rows (value < CAST(63 AS 
tinyint))|6570|5117|9265|6076|-0.1578341|
|Select 90% tinyint rows (value < CAST(114 AS 
tinyint))|9291|9229|10508|10152|-0.090918|
|Select 1 timestamp stored as INT96 row (value = CAST(7864320 AS 
timestamp))|4054|4757|6253|4774|-0.003561|
|Select 10% timestamp stored as INT96 rows (value < CAST(1572864 AS 
timestamp))|6190|4512|5295|13923|-0.6759319|
|Select 50% timestamp stored as INT96 rows (value < CAST(7864320 AS 
timestamp))|8681|7207|9758|8633|-0.1651801|
|Select 90% timestamp stored as INT96 rows (value < CAST(14155776 AS 
timestamp))|13536|9738|11053|11642|-0.1635458|
|Select 1 timestamp stored as TIMESTAMP_MICROS row (value = CAST(7864320 AS 
timestamp))|2904|136|4279|120|0.13333333|
|Select 10% timestamp stored as TIMESTAMP_MICROS rows (value < CAST(1572864 AS 
timestamp))|3983|1209|4872|1251|-0.0335731|
|Select 50% timestamp stored as TIMESTAMP_MICROS rows (value < CAST(7864320 AS 
timestamp))|6886|6052|7441|5546|0.09123693|
|Select 90% timestamp stored as TIMESTAMP_MICROS rows (value < CAST(14155776 AS 
timestamp))|9097|10715|10205|10035|0.06776283|
|Select 1 timestamp stored as TIMESTAMP_MILLIS row (value = CAST(7864320 AS 
timestamp))|3278|218|4349|119|0.83193277|
|Select 10% timestamp stored as TIMESTAMP_MILLIS rows (value < CAST(1572864 AS 
timestamp))|3955|1176|6646|1276|-0.0783699|
|Select 50% timestamp stored as TIMESTAMP_MILLIS rows (value < CAST(7864320 AS 
timestamp))|6684|5161|7766|5635|-0.0841171|
|Select 90% timestamp stored as TIMESTAMP_MILLIS rows (value < CAST(14155776 AS 
timestamp))|10910|9297|10735|10314|-0.0986038|
|Select 1 row with 1 filters|340|351|413|3317|-0.8941815|
|Select 1 row with 250 filters|1033|1132|1256|1078|0.05009276|
|Select 1 row with 500 filters|3022|3353|2868|3197|0.04879575|
|Sum|508182|293442|579526|328587|-0.106958|

> Make Spark SQL support Column indexes
> -------------------------------------
>
>                 Key: PARQUET-1739
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1739
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.11.0
>            Reporter: Yuming Wang
>            Assignee: Yuming Wang
>            Priority: Major
>             Fix For: 1.11.1
>
>
> Make Spark SQL supportĀ Column indexes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to