[ 
https://issues.apache.org/jira/browse/DRILL-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ganesh semalty updated DRILL-3946:
----------------------------------
    Description: 
Hi,

Hadoop is used to put any raw data and then I am using Apache Drill to query 
that data. The data file records are something like this :-

! TICKET_NBR 1 ! GSI 81 ! 3100.2.11.2 1131 ! 3100.2.112.1 14/05/2014 22:45:59 ! 
3100.2.22.3 16 ! 3100.2.10.1 4

where, 3100.2.11.2 = 1131
3100.2.112.1 = 14/05/2014 22:45:59
and so on ... means ! mark separate key value pair. 
If I use raw file (after replacing ! with , comma), and put in hadoop and see 
records from Drill Explorer, it shows as follows :-

TICKET_NBR 1  <--> GSI 81  <---> 3100.2.11.2 1131 <--->

(I am using <---> to show different columns here). Now I do not know 
"3100.2.11.2" will appear in which column in another row... It can be any 
column.
QUERY 1: So is that possible to query a string in any column of the table for a 
match ?

Even if I format the data, such that header contains "3100.2.11.2" and so on 
and then put each row data in columns below that I am unable to query through 
sql, as column name contains dot (.)

select "3100.2.11.2" from TABLE where "3100.2.121.1" = ABC;

Query fails, even if i surround column name within square brackets "[" "]".
I converted my .csv file to .json, still frome drill explorer the data is 
represented in same manner and thus query doesnt work,

QUERY 2: So I am not able to understand what we mean when we say, Apache drill 
can work on un-structured data as well. Is it necessary to format the data 
before we query ?

Even when I replaced 3100.2.22.8 with 3100_2_22_8 and so on ... stil error:

0: jdbc:drill:zk=local> select 3100_2_22_8 from 
hadoop.`/user/hduser/samplecdl_odo_1.json`;
Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a 
record. Current token was START_ARRAY

File  /user/hduser/samplecdl_odo_1.json
Record  1
Fragment 0:0
[Error Id: a2bf3e90-1052-43c3-9a3a-fcff74666591 on ubuntu:31010] (state=,code=0)






  was:
Hi,

Hadoop is used to put any raw data and then I am using Apache Drill to query 
that data. The data file records are something like this :-

! TICKET_NBR 1 ! GSI 81 ! 3100.2.11.2 1131 ! 3100.2.112.1 14/05/2014 22:45:59 ! 
3100.2.22.3 16 ! 3100.2.10.1 4

where, 3100.2.11.2 = 1131
3100.2.112.1 = 14/05/2014 22:45:59
and so on ... means ! mark separate key value pair. 
If I use raw file (after replacing ! with , comma), and put in hadoop and see 
records from Drill Explorer, it shows as follows :-

TICKET_NBR 1  <--> GSI 81  <---> 3100.2.11.2 1131 <--->

(I am using <---> to show different columns here). Now I do not know 
"3100.2.11.2" will appear in which column in another row... It can be any 
column.
QUERY: So is that possible to query a string in any column of the table for a 
match ?

Even if I format the data, such that header contains "3100.2.11.2" and so on 
and then put each row data in columns below that I am unable to query through 
sql, as column name contains dot (.)

select "3100.2.11.2" from TABLE where "3100.2.121.1" = ABC;

Query fails, even if i surround column name within square brackets "[" "]".
I converted my .csv file to .json, still frome drill explorer the data is 
represented in same manner and thus query doesnt work,


QUERY : So I am not able to understand what we mean when we say, Apache drill 
can work on un-structured data as well. Is it necessary to format the data 
before we query ?




> How to run query with column name containing dot (.)
> ----------------------------------------------------
>
>                 Key: DRILL-3946
>                 URL: https://issues.apache.org/jira/browse/DRILL-3946
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.1.0
>         Environment: Apache Drill 1.1.0
> Ubuntu
>            Reporter: ganesh semalty
>
> Hi,
> Hadoop is used to put any raw data and then I am using Apache Drill to query 
> that data. The data file records are something like this :-
> ! TICKET_NBR 1 ! GSI 81 ! 3100.2.11.2 1131 ! 3100.2.112.1 14/05/2014 22:45:59 
> ! 3100.2.22.3 16 ! 3100.2.10.1 4
> where, 3100.2.11.2 = 1131
> 3100.2.112.1 = 14/05/2014 22:45:59
> and so on ... means ! mark separate key value pair. 
> If I use raw file (after replacing ! with , comma), and put in hadoop and see 
> records from Drill Explorer, it shows as follows :-
> TICKET_NBR 1  <--> GSI 81  <---> 3100.2.11.2 1131 <--->
> (I am using <---> to show different columns here). Now I do not know 
> "3100.2.11.2" will appear in which column in another row... It can be any 
> column.
> QUERY 1: So is that possible to query a string in any column of the table for 
> a match ?
> Even if I format the data, such that header contains "3100.2.11.2" and so on 
> and then put each row data in columns below that I am unable to query through 
> sql, as column name contains dot (.)
> select "3100.2.11.2" from TABLE where "3100.2.121.1" = ABC;
> Query fails, even if i surround column name within square brackets "[" "]".
> I converted my .csv file to .json, still frome drill explorer the data is 
> represented in same manner and thus query doesnt work,
> QUERY 2: So I am not able to understand what we mean when we say, Apache 
> drill can work on un-structured data as well. Is it necessary to format the 
> data before we query ?
> Even when I replaced 3100.2.22.8 with 3100_2_22_8 and so on ... stil error:
> 0: jdbc:drill:zk=local> select 3100_2_22_8 from 
> hadoop.`/user/hduser/samplecdl_odo_1.json`;
> Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a 
> record. Current token was START_ARRAY
> File  /user/hduser/samplecdl_odo_1.json
> Record  1
> Fragment 0:0
> [Error Id: a2bf3e90-1052-43c3-9a3a-fcff74666591 on ubuntu:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to