[
https://issues.apache.org/jira/browse/DRILL-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346886#comment-15346886
]
Khurram Faraaz edited comment on DRILL-4478 at 6/23/16 6:11 PM:
----------------------------------------------------------------
{noformat}
Querying data from a CSV text file and using binary_string function on that
data returns different values for the same input. I am on commit ID : 6286c0a4
(Drill 1.7.0)
0: jdbc:drill:schema=dfs.tmp> select binary_string(columns[0]) from
`binStrDuplcs.csv`;
+--------------+
| EXPR$0 |
+--------------+
| [B@15ea08ee |
| [B@37f04c7f |
| [B@12e428a |
| [B@41272a1 |
| [B@5723aa1d |
| [B@6675829c |
| [B@2cd20451 |
| [B@101978d4 |
| [B@784bae8d |
| [B@30b0e8ae |
| [B@2e7c107b |
| [B@531e1314 |
| [B@5b76b0ad |
| [B@4d495cc4 |
| [B@b696f80 |
| [B@3717425a |
+--------------+
16 rows selected (0.112 seconds)
The CSV file has the same data value in each row.
0: jdbc:drill:schema=dfs.tmp> select columns[0] from `binStrDuplcs.csv`;
+-------------------------+
| EXPR$0 |
+-------------------------+
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
+-------------------------+
16 rows selected (0.163 seconds)
0: jdbc:drill:schema=dfs.tmp> select string_binary(columns[0]) from
`binStrDuplcs.csv`;
+-------------------------------------------------+
| EXPR$0 |
+-------------------------------------------------+
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
+-------------------------------------------------+
16 rows selected (0.156 seconds)
{noformat}
Content from CSV file used in above queries.
{noformat}
[root@centos-01 binary_string]# cat binStrDuplcs.csv
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
[root@centos-01 binary_string]#
{noformat}
was (Author: khfaraaz):
{noformat}
Querying data from a CSV text file and using binary_string function on that
data returns different values for the same input.
0: jdbc:drill:schema=dfs.tmp> select binary_string(columns[0]) from
`binStrDuplcs.csv`;
+--------------+
| EXPR$0 |
+--------------+
| [B@15ea08ee |
| [B@37f04c7f |
| [B@12e428a |
| [B@41272a1 |
| [B@5723aa1d |
| [B@6675829c |
| [B@2cd20451 |
| [B@101978d4 |
| [B@784bae8d |
| [B@30b0e8ae |
| [B@2e7c107b |
| [B@531e1314 |
| [B@5b76b0ad |
| [B@4d495cc4 |
| [B@b696f80 |
| [B@3717425a |
+--------------+
16 rows selected (0.112 seconds)
The CSV file has the same data value in each row.
0: jdbc:drill:schema=dfs.tmp> select columns[0] from `binStrDuplcs.csv`;
+-------------------------+
| EXPR$0 |
+-------------------------+
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
| '\\x99\\x8c\\x2f\\x77' |
+-------------------------+
16 rows selected (0.163 seconds)
0: jdbc:drill:schema=dfs.tmp> select string_binary(columns[0]) from
`binStrDuplcs.csv`;
+-------------------------------------------------+
| EXPR$0 |
+-------------------------------------------------+
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
| '\x5C\x5Cx99\x5C\x5Cx8c\x5C\x5Cx2f\x5C\x5Cx77' |
+-------------------------------------------------+
16 rows selected (0.156 seconds)
{noformat}
Content from CSV file used in above queries.
{noformat}
[root@centos-01 binary_string]# cat binStrDuplcs.csv
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
'\\x99\\x8c\\x2f\\x77'
[root@centos-01 binary_string]#
{noformat}
> binary_string cannot convert buffer that were not start from 0 correctly
> ------------------------------------------------------------------------
>
> Key: DRILL-4478
> URL: https://issues.apache.org/jira/browse/DRILL-4478
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Codegen
> Reporter: Chunhui Shi
> Assignee: Chunhui Shi
> Fix For: 1.7.0
>
>
> When binary_string was called multiple times, it can only convert the first
> one correctly if the drillbuf start from 0. For the second and afterwards
> calls, because the drillbuf is not starting from 0 thus
> DrillStringUtils.parseBinaryString could not do the work correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)