[ 
https://issues.apache.org/jira/browse/DRILL-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-5899:
------------------------------------
    Description: 
For the 4 simple patterns we have i.e. startsWith, endsWith, contains and 
constant,, we do not need the overhead of charSequenceWrapper. We can work with 
DrillBuf directly. This will save us from doing isAscii check and UTF8 decoding 
for each row.
UTF-8 encoding ensures that no UTF-8 character is a prefix of any other valid 
character. So, instead of decoding varChar from each row we are processing, 
encode the patternString once during setup and do raw byte comparison. Instead 
of bounds checking and reading one byte at a time, we get the whole buffer in 
one shot and use that for comparison.
This improved overall performance for filter operator by around 20%. 


  was:
For simple pattern matcher, we do not have to do isAscii check. 
UTF-8 encoding ensures that no UTF-8 character is a prefix of any other valid 
character. So, for the 4 simple patterns we have i.e. startsWith, endsWith, 
contains and constant, we can get rid of this check. This will help improve 
performance. 



> Simple pattern matchers can work with DrillBuf directly
> -------------------------------------------------------
>
>                 Key: DRILL-5899
>                 URL: https://issues.apache.org/jira/browse/DRILL-5899
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>
> For the 4 simple patterns we have i.e. startsWith, endsWith, contains and 
> constant,, we do not need the overhead of charSequenceWrapper. We can work 
> with DrillBuf directly. This will save us from doing isAscii check and UTF8 
> decoding for each row.
> UTF-8 encoding ensures that no UTF-8 character is a prefix of any other valid 
> character. So, instead of decoding varChar from each row we are processing, 
> encode the patternString once during setup and do raw byte comparison. 
> Instead of bounds checking and reading one byte at a time, we get the whole 
> buffer in one shot and use that for comparison.
> This improved overall performance for filter operator by around 20%. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to