[GitHub] [druid] clintropolis commented on pull request #12277: add support for 'front coded' string dictionaries for smaller string columns

GitBox Tue, 01 Mar 2022 11:37:51 -0800


clintropolis commented on pull request #12277:
URL: https://github.com/apache/druid/pull/12277#issuecomment-1055789995



   Some additional less scientific measurements, using a 10GB file of the nyc 
taxi dataset with all of the columns stored as strings:
   <img width="543" alt="Screen Shot 2022-02-27 at 6 21 58 PM" 
src="https://user-images.githubusercontent.com/1577461/156236031-1ac864f3-e382-424b-b547-0693cbd73e30.png";>
   
   grouping performance seems competitive:
   <img width="1415" alt="Screen Shot 2022-02-28 at 1 26 06 PM" 
src="https://user-images.githubusercontent.com/1577461/156236107-ab93a26d-0fae-49b0-bd19-17ab7eaf26ad.png";>
   <img width="1408" alt="Screen Shot 2022-02-28 at 1 25 51 PM" 
src="https://user-images.githubusercontent.com/1577461/156236156-90389487-49cb-4ded-bce1-5c5b58fd759d.png";>
   
   select * does show a performance decrease as the earlier benchmarks 
suggested:
   <img width="1403" alt="Screen Shot 2022-02-27 at 6 28 15 PM" 
src="https://user-images.githubusercontent.com/1577461/156236275-9436433a-3101-4481-988a-50c17cbc1434.png";>
   <img width="1406" alt="Screen Shot 2022-02-27 at 6 27 41 PM" 
src="https://user-images.githubusercontent.com/1577461/156236290-b604a6fb-a93b-4e9a-a228-724816536c1e.png";>
   
   I still haven't had the chance to spend any time optimizing the code, but 
the size savings definitely make this feel worth considering for clusters where 
the typical workload does not include queries which hit a lot of columns like 
"wide" scans ("select *", etc) or group bys or things that hit a large number 
of columns.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on pull request #12277: add support for 'front coded' string dictionaries for smaller string columns

Reply via email to