[ 
https://issues.apache.org/jira/browse/DRILL-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-2760:
-----------------------------------
    Description: 
Quoted strings appear in query output in different forms, as shown in the 
section below.
Quotes should NOT appear in query output. Strings must be stripped of their 
leading and prevailing quotes. (I am referring to this character - " )

Snippet of data from airports.cv file, first three lines, the first line has 
header information.
{code}
[root@centos-01 airport_CSV_data]# head -3 airports.csv
"id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
6523,"00A","heliport","Total Rf 
Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,
6524,"00AK","small_airport","Lowell 
Field",59.94919968,-151.695999146,450,"NA","US","US-AK","Anchor 
Point","no","00AK",,"00AK",,,
{code}
case 1) In this case quotes are not escaped, they appear in the output as is.
{code}
0: jdbc:drill:> select columns[0] id,columns[1] ident,columns[2] 
type,columns[3] name,columns[4] latitude_deg,columns[5] 
longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8] 
iso_country,columns[9] iso_region,columns[10] municipality,columns[11] 
scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14] 
local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17] 
keywords from `airports.csv` limit 3;
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
|     id     |   ident    |    type    |    name    | latitude_deg | 
longitude_deg | elevation_ft | continent  | iso_country | iso_region | 
municipality | scheduled_service |  gps_code  | iata_code  | local_code | 
home_link  | wikipedia_link |  keywords  |
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
| "id"       | "ident"    | "type"     | "name"     | "latitude_deg" | 
"longitude_deg" | "elevation_ft" | "continent" | "iso_country" | "iso_region" | 
"municipality" | "scheduled_service" | "gps_code" | "iata_code" | "local_code" 
| "home_link" | "wikipedia_link" | "keywords" |
| 6523       | "00A"      | "heliport" | "Total Rf Heliport" | 40.07080078125 | 
-74.9336013793945 | 11           | "NA"       | "US"        | "US-PA"    | 
"Bensalem"   | "no"              | "00A"      |            | "00A"      |       
     |                | null       |
| 6524       | "00AK"     | "small_airport" | "Lowell Field" | 59.94919968  | 
-151.695999146 | 450          | "NA"       | "US"        | "US-AK"    | "Anchor 
Point" | "no"              | "00AK"     |            | "00AK"     |            
|                | null       |
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
3 rows selected (0.155 seconds)
{code}
In this case quotes appear in the query output but they are escaped with 
backslash character in the output.
{code}
0: jdbc:drill:> select * from `airports.csv` limit 3;
+------------+
|  columns   |
+------------+
| 
["\"id\"","\"ident\"","\"type\"","\"name\"","\"latitude_deg\"","\"longitude_deg\"","\"elevation_ft\"","\"continent\"","\"iso_country\"","\"iso_region\"","\"municipality\"","\"scheduled_service\"","\"gps_code\"","\"iata_code\"","\"local_code\"","\"home_link\"","\"wikipedia_link\"","\"keywords\""]
 |
| ["6523","\"00A\"","\"heliport\"","\"Total Rf 
Heliport\"","40.07080078125","-74.9336013793945","11","\"NA\"","\"US\"","\"US-PA\"","\"Bensalem\"","\"no\"","\"00A\"","","\"00A\"","",""]
 |
| ["6524","\"00AK\"","\"small_airport\"","\"Lowell 
Field\"","59.94919968","-151.695999146","450","\"NA\"","\"US\"","\"US-AK\"","\"Anchor
 Point\"","\"no\"","\"00AK\"","","\"00AK\"","",""] |
+------------+
3 rows selected (0.097 seconds)
{code}

  was:
Quoted strings appear in query output in different forms, as shown in the 
section below.
Quotes should NOT appear in query output. Strings must be stripped of their 
leading and prevailing quotes. (I am referring to this character - " )

{code}
Snippet of data from airports.cv file, first three lines, the first line has 
header information.

[root@centos-01 airport_CSV_data]# head -3 airports.csv
"id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
6523,"00A","heliport","Total Rf 
Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,
6524,"00AK","small_airport","Lowell 
Field",59.94919968,-151.695999146,450,"NA","US","US-AK","Anchor 
Point","no","00AK",,"00AK",,,

case 1) In this case quotes are not escaped, they appear in the output as is.

0: jdbc:drill:> select columns[0] id,columns[1] ident,columns[2] 
type,columns[3] name,columns[4] latitude_deg,columns[5] 
longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8] 
iso_country,columns[9] iso_region,columns[10] municipality,columns[11] 
scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14] 
local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17] 
keywords from `airports.csv` limit 3;
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
|     id     |   ident    |    type    |    name    | latitude_deg | 
longitude_deg | elevation_ft | continent  | iso_country | iso_region | 
municipality | scheduled_service |  gps_code  | iata_code  | local_code | 
home_link  | wikipedia_link |  keywords  |
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
| "id"       | "ident"    | "type"     | "name"     | "latitude_deg" | 
"longitude_deg" | "elevation_ft" | "continent" | "iso_country" | "iso_region" | 
"municipality" | "scheduled_service" | "gps_code" | "iata_code" | "local_code" 
| "home_link" | "wikipedia_link" | "keywords" |
| 6523       | "00A"      | "heliport" | "Total Rf Heliport" | 40.07080078125 | 
-74.9336013793945 | 11           | "NA"       | "US"        | "US-PA"    | 
"Bensalem"   | "no"              | "00A"      |            | "00A"      |       
     |                | null       |
| 6524       | "00AK"     | "small_airport" | "Lowell Field" | 59.94919968  | 
-151.695999146 | 450          | "NA"       | "US"        | "US-AK"    | "Anchor 
Point" | "no"              | "00AK"     |            | "00AK"     |            
|                | null       |
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
3 rows selected (0.155 seconds)

In this case quotes appear in the query output but they are escaped with 
backslash character in the output.

0: jdbc:drill:> select * from `airports.csv` limit 3;
+------------+
|  columns   |
+------------+
| 
["\"id\"","\"ident\"","\"type\"","\"name\"","\"latitude_deg\"","\"longitude_deg\"","\"elevation_ft\"","\"continent\"","\"iso_country\"","\"iso_region\"","\"municipality\"","\"scheduled_service\"","\"gps_code\"","\"iata_code\"","\"local_code\"","\"home_link\"","\"wikipedia_link\"","\"keywords\""]
 |
| ["6523","\"00A\"","\"heliport\"","\"Total Rf 
Heliport\"","40.07080078125","-74.9336013793945","11","\"NA\"","\"US\"","\"US-PA\"","\"Bensalem\"","\"no\"","\"00A\"","","\"00A\"","",""]
 |
| ["6524","\"00AK\"","\"small_airport\"","\"Lowell 
Field\"","59.94919968","-151.695999146","450","\"NA\"","\"US\"","\"US-AK\"","\"Anchor
 Point\"","\"no\"","\"00AK\"","","\"00AK\"","",""] |
+------------+
3 rows selected (0.097 seconds)

{code}


> Quoted strings from CSV file appear in query output in different forms
> ----------------------------------------------------------------------
>
>                 Key: DRILL-2760
>                 URL: https://issues.apache.org/jira/browse/DRILL-2760
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 0.9.0
>         Environment: | 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: 
> Exit early from HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53 
> EDT
> 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Steven Phillips
>             Fix For: 1.0.0
>
>
> Quoted strings appear in query output in different forms, as shown in the 
> section below.
> Quotes should NOT appear in query output. Strings must be stripped of their 
> leading and prevailing quotes. (I am referring to this character - " )
> Snippet of data from airports.cv file, first three lines, the first line has 
> header information.
> {code}
> [root@centos-01 airport_CSV_data]# head -3 airports.csv
> "id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
> 6523,"00A","heliport","Total Rf 
> Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,
> 6524,"00AK","small_airport","Lowell 
> Field",59.94919968,-151.695999146,450,"NA","US","US-AK","Anchor 
> Point","no","00AK",,"00AK",,,
> {code}
> case 1) In this case quotes are not escaped, they appear in the output as is.
> {code}
> 0: jdbc:drill:> select columns[0] id,columns[1] ident,columns[2] 
> type,columns[3] name,columns[4] latitude_deg,columns[5] 
> longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8] 
> iso_country,columns[9] iso_region,columns[10] municipality,columns[11] 
> scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14] 
> local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17] 
> keywords from `airports.csv` limit 3;
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> |     id     |   ident    |    type    |    name    | latitude_deg | 
> longitude_deg | elevation_ft | continent  | iso_country | iso_region | 
> municipality | scheduled_service |  gps_code  | iata_code  | local_code | 
> home_link  | wikipedia_link |  keywords  |
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> | "id"       | "ident"    | "type"     | "name"     | "latitude_deg" | 
> "longitude_deg" | "elevation_ft" | "continent" | "iso_country" | "iso_region" 
> | "municipality" | "scheduled_service" | "gps_code" | "iata_code" | 
> "local_code" | "home_link" | "wikipedia_link" | "keywords" |
> | 6523       | "00A"      | "heliport" | "Total Rf Heliport" | 40.07080078125 
> | -74.9336013793945 | 11           | "NA"       | "US"        | "US-PA"    | 
> "Bensalem"   | "no"              | "00A"      |            | "00A"      |     
>        |                | null       |
> | 6524       | "00AK"     | "small_airport" | "Lowell Field" | 59.94919968  | 
> -151.695999146 | 450          | "NA"       | "US"        | "US-AK"    | 
> "Anchor Point" | "no"              | "00AK"     |            | "00AK"     |   
>          |                | null       |
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> 3 rows selected (0.155 seconds)
> {code}
> In this case quotes appear in the query output but they are escaped with 
> backslash character in the output.
> {code}
> 0: jdbc:drill:> select * from `airports.csv` limit 3;
> +------------+
> |  columns   |
> +------------+
> | 
> ["\"id\"","\"ident\"","\"type\"","\"name\"","\"latitude_deg\"","\"longitude_deg\"","\"elevation_ft\"","\"continent\"","\"iso_country\"","\"iso_region\"","\"municipality\"","\"scheduled_service\"","\"gps_code\"","\"iata_code\"","\"local_code\"","\"home_link\"","\"wikipedia_link\"","\"keywords\""]
>  |
> | ["6523","\"00A\"","\"heliport\"","\"Total Rf 
> Heliport\"","40.07080078125","-74.9336013793945","11","\"NA\"","\"US\"","\"US-PA\"","\"Bensalem\"","\"no\"","\"00A\"","","\"00A\"","",""]
>  |
> | ["6524","\"00AK\"","\"small_airport\"","\"Lowell 
> Field\"","59.94919968","-151.695999146","450","\"NA\"","\"US\"","\"US-AK\"","\"Anchor
>  Point\"","\"no\"","\"00AK\"","","\"00AK\"","",""] |
> +------------+
> 3 rows selected (0.097 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to