[
https://issues.apache.org/jira/browse/DRILL-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ganesh semalty updated DRILL-3946:
----------------------------------
Description:
Hi,
Hadoop is used to put any raw data and then I am using Apache Drill to query
that data. The data file records are something like this :-
! TICKET_NBR 1 ! GSI 81 ! 3100.2.11.2 1131 ! 3100.2.112.1 14/05/2014 22:45:59 !
3100.2.22.3 16 ! 3100.2.10.1 4
where, 3100.2.11.2 = 1131
3100.2.112.1 = 14/05/2014 22:45:59
and so on ... means ! mark separate key value pair.
If I use raw file (after replacing ! with , comma), and put in hadoop and see
records from Drill Explorer, it shows as follows :-
TICKET_NBR 1 <--> GSI 81 <---> 3100.2.11.2 1131 <--->
(I am using <---> to show different columns here). Now I do not know
"3100.2.11.2" will appear in which column in another row... It can be any
column.
QUERY 1: So is that possible to query a string in any column of the table for a
match ?
Even if I format the data, such that header contains "3100.2.11.2" and so on
and then put each row data in columns below that I am unable to query through
sql, as column name contains dot (.)
select "3100.2.11.2" from TABLE where "3100.2.121.1" = ABC;
Query fails, even if i surround column name within square brackets "[" "]".
I converted my .csv file to .json, still frome drill explorer the data is
represented in same manner and thus query doesnt work,
QUERY 2: So I am not able to understand what we mean when we say, Apache drill
can work on un-structured data as well. Is it necessary to format the data
before we query ?
Even when I replaced 3100.2.22.8 with 3100_2_22_8 and so on ... stil error:
0: jdbc:drill:zk=local> select 3100_2_22_8 from
hadoop.`/user/hduser/samplecdl_odo_1.json`;
Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a
record. Current token was START_ARRAY
File /user/hduser/samplecdl_odo_1.json
Record 1
Fragment 0:0
[Error Id: a2bf3e90-1052-43c3-9a3a-fcff74666591 on ubuntu:31010] (state=,code=0)
was:
Hi,
Hadoop is used to put any raw data and then I am using Apache Drill to query
that data. The data file records are something like this :-
! TICKET_NBR 1 ! GSI 81 ! 3100.2.11.2 1131 ! 3100.2.112.1 14/05/2014 22:45:59 !
3100.2.22.3 16 ! 3100.2.10.1 4
where, 3100.2.11.2 = 1131
3100.2.112.1 = 14/05/2014 22:45:59
and so on ... means ! mark separate key value pair.
If I use raw file (after replacing ! with , comma), and put in hadoop and see
records from Drill Explorer, it shows as follows :-
TICKET_NBR 1 <--> GSI 81 <---> 3100.2.11.2 1131 <--->
(I am using <---> to show different columns here). Now I do not know
"3100.2.11.2" will appear in which column in another row... It can be any
column.
QUERY: So is that possible to query a string in any column of the table for a
match ?
Even if I format the data, such that header contains "3100.2.11.2" and so on
and then put each row data in columns below that I am unable to query through
sql, as column name contains dot (.)
select "3100.2.11.2" from TABLE where "3100.2.121.1" = ABC;
Query fails, even if i surround column name within square brackets "[" "]".
I converted my .csv file to .json, still frome drill explorer the data is
represented in same manner and thus query doesnt work,
QUERY : So I am not able to understand what we mean when we say, Apache drill
can work on un-structured data as well. Is it necessary to format the data
before we query ?
> How to run query with column name containing dot (.)
> ----------------------------------------------------
>
> Key: DRILL-3946
> URL: https://issues.apache.org/jira/browse/DRILL-3946
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.1.0
> Environment: Apache Drill 1.1.0
> Ubuntu
> Reporter: ganesh semalty
>
> Hi,
> Hadoop is used to put any raw data and then I am using Apache Drill to query
> that data. The data file records are something like this :-
> ! TICKET_NBR 1 ! GSI 81 ! 3100.2.11.2 1131 ! 3100.2.112.1 14/05/2014 22:45:59
> ! 3100.2.22.3 16 ! 3100.2.10.1 4
> where, 3100.2.11.2 = 1131
> 3100.2.112.1 = 14/05/2014 22:45:59
> and so on ... means ! mark separate key value pair.
> If I use raw file (after replacing ! with , comma), and put in hadoop and see
> records from Drill Explorer, it shows as follows :-
> TICKET_NBR 1 <--> GSI 81 <---> 3100.2.11.2 1131 <--->
> (I am using <---> to show different columns here). Now I do not know
> "3100.2.11.2" will appear in which column in another row... It can be any
> column.
> QUERY 1: So is that possible to query a string in any column of the table for
> a match ?
> Even if I format the data, such that header contains "3100.2.11.2" and so on
> and then put each row data in columns below that I am unable to query through
> sql, as column name contains dot (.)
> select "3100.2.11.2" from TABLE where "3100.2.121.1" = ABC;
> Query fails, even if i surround column name within square brackets "[" "]".
> I converted my .csv file to .json, still frome drill explorer the data is
> represented in same manner and thus query doesnt work,
> QUERY 2: So I am not able to understand what we mean when we say, Apache
> drill can work on un-structured data as well. Is it necessary to format the
> data before we query ?
> Even when I replaced 3100.2.22.8 with 3100_2_22_8 and so on ... stil error:
> 0: jdbc:drill:zk=local> select 3100_2_22_8 from
> hadoop.`/user/hduser/samplecdl_odo_1.json`;
> Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a
> record. Current token was START_ARRAY
> File /user/hduser/samplecdl_odo_1.json
> Record 1
> Fragment 0:0
> [Error Id: a2bf3e90-1052-43c3-9a3a-fcff74666591 on ubuntu:31010]
> (state=,code=0)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)