[ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307230#comment-15307230
 ] 

Jinfeng Ni commented on DRILL-4573:
-----------------------------------

If relying on a configuration to turn on or off the additional check means user 
has to set the option on /off, this seems not a reasonable approach.  A regular 
user may or may not know whether the data has ASCII only or not; Drill should 
not force user to remember to set the option, in order to get this 7% 
performance difference. 

How do you check if the input has ASCII only? I thought it could be done by 
simply checking the # of chars == # of bytes. 

For the LIKE option you talked about, you may consider open a separate JIRA to 
deliver the fix.  For this one, let's focus on getting the incorrect issue 
fixed. We have to fix the incorrect issue in the next release, since incorrect 
result is a critical bug. 
    

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --------------------------------------
>
>                 Key: DRILL-4573
>                 URL: https://issues.apache.org/jira/browse/DRILL-4573
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: jean-claude
>            Priority: Critical
>             Fix For: 1.7.0
>
>         Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to