[ 
https://issues.apache.org/jira/browse/DRILL-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346886#comment-15346886
 ] 

Khurram Faraaz edited comment on DRILL-4478 at 6/23/16 6:11 PM:
----------------------------------------------------------------


{noformat}
Querying data from a CSV text file and using binary_string function on that 
data returns different values for the same input. I am on commit ID : 6286c0a4 
(Drill 1.7.0)

0: jdbc:drill:schema=dfs.tmp> select binary_string(columns[0]) from 
`binStrDuplcs.csv`;
+--------------+
|    EXPR$0    |
+--------------+
| [B@15ea08ee  |
| [B@37f04c7f  |
| [B@12e428a   |
| [B@41272a1   |
| [B@5723aa1d  |
| [B@6675829c  |
| [B@2cd20451  |
| [B@101978d4  |
| [B@784bae8d  |
| [B@30b0e8ae  |
| [B@2e7c107b  |
| [B@531e1314  |
| [B@5b76b0ad  |
| [B@4d495cc4  |
| [B@b696f80   |
| [B@3717425a  |
+--------------+
16 rows selected (0.112 seconds)

The CSV file has the same data value in each row.

0: jdbc:drill:schema=dfs.tmp> select columns[0] from `binStrDuplcs.csv`;
+-------------------------+
|         EXPR$0          |
+-------------------------+
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
+-------------------------+
16 rows selected (0.163 seconds)

0: jdbc:drill:schema=dfs.tmp> select string_binary(columns[0]) from 
`binStrDuplcs.csv`;
+-------------------------------------------------+
|                     EXPR$0                      |
+-------------------------------------------------+
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
+-------------------------------------------------+
16 rows selected (0.156 seconds)
{noformat}

Content from CSV file used in above queries.

{noformat}
[root@centos-01 binary_string]# cat binStrDuplcs.csv
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
[root@centos-01 binary_string]#
{noformat}


was (Author: khfaraaz):


{noformat}
Querying data from a CSV text file and using binary_string function on that 
data returns different values for the same input. 

0: jdbc:drill:schema=dfs.tmp> select binary_string(columns[0]) from 
`binStrDuplcs.csv`;
+--------------+
|    EXPR$0    |
+--------------+
| [B@15ea08ee  |
| [B@37f04c7f  |
| [B@12e428a   |
| [B@41272a1   |
| [B@5723aa1d  |
| [B@6675829c  |
| [B@2cd20451  |
| [B@101978d4  |
| [B@784bae8d  |
| [B@30b0e8ae  |
| [B@2e7c107b  |
| [B@531e1314  |
| [B@5b76b0ad  |
| [B@4d495cc4  |
| [B@b696f80   |
| [B@3717425a  |
+--------------+
16 rows selected (0.112 seconds)

The CSV file has the same data value in each row.

0: jdbc:drill:schema=dfs.tmp> select columns[0] from `binStrDuplcs.csv`;
+-------------------------+
|         EXPR$0          |
+-------------------------+
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
| '\\x99\\x8c\\x2f\\x77'  |
+-------------------------+
16 rows selected (0.163 seconds)

0: jdbc:drill:schema=dfs.tmp> select string_binary(columns[0]) from 
`binStrDuplcs.csv`;
+-------------------------------------------------+
|                     EXPR$0                      |
+-------------------------------------------------+
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77'  |
+-------------------------------------------------+
16 rows selected (0.156 seconds)
{noformat}

Content from CSV file used in above queries.

{noformat}
[root@centos-01 binary_string]# cat binStrDuplcs.csv
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
[root@centos-01 binary_string]#
{noformat}

> binary_string cannot convert buffer that were not start from 0 correctly
> ------------------------------------------------------------------------
>
>                 Key: DRILL-4478
>                 URL: https://issues.apache.org/jira/browse/DRILL-4478
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Chunhui Shi
>            Assignee: Chunhui Shi
>             Fix For: 1.7.0
>
>
> When binary_string was called multiple times, it can only convert the first 
> one correctly if the drillbuf start from 0. For the second and afterwards 
> calls, because the drillbuf is not starting from 0 thus 
> DrillStringUtils.parseBinaryString could not do the work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to