[
https://issues.apache.org/jira/browse/DRILL-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616026#comment-14616026
]
Steven Phillips commented on DRILL-2760:
----------------------------------------
This seems to be fixed now:
{code}
0: jdbc:drill:drillbit=localhost> select columns[0] id,columns[1]
ident,columns[2] type,columns[3] name,columns[4] latitude_deg,columns[5]
longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8]
iso_country,columns[9] iso_region,columns[10] municipality,columns[11]
scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14]
local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17]
keywords from dfs.tmp.`airports.csv`;
+-------+--------+----------------+--------------------+-----------------+--------------------+---------------+------------+--------------+-------------+---------------+--------------------+-----------+------------+-------------+------------+-----------------+-----------+
| id | ident | type | name | latitude_deg |
longitude_deg | elevation_ft | continent | iso_country | iso_region |
municipality | scheduled_service | gps_code | iata_code | local_code |
home_link | wikipedia_link | keywords |
+-------+--------+----------------+--------------------+-----------------+--------------------+---------------+------------+--------------+-------------+---------------+--------------------+-----------+------------+-------------+------------+-----------------+-----------+
| id | ident | type | name | latitude_deg |
longitude_deg | elevation_ft | continent | iso_country | iso_region |
municipality | scheduled_service | gps_code | iata_code | local_code |
home_link | wikipedia_link | keywords |
| 6523 | 00A | heliport | Total Rf Heliport | 40.07080078125 |
-74.9336013793945 | 11 | NA | US | US-PA |
Bensalem | no | 00A | | 00A |
| | |
| 6524 | 00AK | small_airport | Lowell Field | 59.94919968 |
-151.695999146 | 450 | NA | US | US-AK |
Anchor Point | no | 00AK | | 00AK |
| | |
+-------+--------+----------------+--------------------+-----------------+--------------------+---------------+------------+--------------+-------------+---------------+--------------------+-----------+------------+-------------+------------+-----------------+-----------+
3 rows selected (0.554 seconds)
{code}
> Quoted strings from CSV file appear in query output in different forms
> ----------------------------------------------------------------------
>
> Key: DRILL-2760
> URL: https://issues.apache.org/jira/browse/DRILL-2760
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 0.9.0
> Environment: | 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580:
> Exit early from HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53
> EDT
> 4 node cluster on CentOS
> Reporter: Khurram Faraaz
> Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> Quoted strings appear in query output in different forms, as shown in the
> section below.
> Quotes should NOT appear in query output. Strings must be stripped of their
> leading and prevailing quotes. (I am referring to this character - " )
> Snippet of data from airports.cv file, first three lines, the first line has
> header information.
> {code}
> [root@centos-01 airport_CSV_data]# head -3 airports.csv
> "id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
> 6523,"00A","heliport","Total Rf
> Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,
> 6524,"00AK","small_airport","Lowell
> Field",59.94919968,-151.695999146,450,"NA","US","US-AK","Anchor
> Point","no","00AK",,"00AK",,,
> {code}
> case 1) In this case quotes are not escaped, they appear in the output as is.
> {code}
> 0: jdbc:drill:> select columns[0] id,columns[1] ident,columns[2]
> type,columns[3] name,columns[4] latitude_deg,columns[5]
> longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8]
> iso_country,columns[9] iso_region,columns[10] municipality,columns[11]
> scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14]
> local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17]
> keywords from `airports.csv` limit 3;
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> | id | ident | type | name | latitude_deg |
> longitude_deg | elevation_ft | continent | iso_country | iso_region |
> municipality | scheduled_service | gps_code | iata_code | local_code |
> home_link | wikipedia_link | keywords |
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> | "id" | "ident" | "type" | "name" | "latitude_deg" |
> "longitude_deg" | "elevation_ft" | "continent" | "iso_country" | "iso_region"
> | "municipality" | "scheduled_service" | "gps_code" | "iata_code" |
> "local_code" | "home_link" | "wikipedia_link" | "keywords" |
> | 6523 | "00A" | "heliport" | "Total Rf Heliport" | 40.07080078125
> | -74.9336013793945 | 11 | "NA" | "US" | "US-PA" |
> "Bensalem" | "no" | "00A" | | "00A" |
> | | null |
> | 6524 | "00AK" | "small_airport" | "Lowell Field" | 59.94919968 |
> -151.695999146 | 450 | "NA" | "US" | "US-AK" |
> "Anchor Point" | "no" | "00AK" | | "00AK" |
> | | null |
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> 3 rows selected (0.155 seconds)
> {code}
> In this case quotes appear in the query output but they are escaped with
> backslash character in the output.
> {code}
> 0: jdbc:drill:> select * from `airports.csv` limit 3;
> +------------+
> | columns |
> +------------+
> |
> ["\"id\"","\"ident\"","\"type\"","\"name\"","\"latitude_deg\"","\"longitude_deg\"","\"elevation_ft\"","\"continent\"","\"iso_country\"","\"iso_region\"","\"municipality\"","\"scheduled_service\"","\"gps_code\"","\"iata_code\"","\"local_code\"","\"home_link\"","\"wikipedia_link\"","\"keywords\""]
> |
> | ["6523","\"00A\"","\"heliport\"","\"Total Rf
> Heliport\"","40.07080078125","-74.9336013793945","11","\"NA\"","\"US\"","\"US-PA\"","\"Bensalem\"","\"no\"","\"00A\"","","\"00A\"","",""]
> |
> | ["6524","\"00AK\"","\"small_airport\"","\"Lowell
> Field\"","59.94919968","-151.695999146","450","\"NA\"","\"US\"","\"US-AK\"","\"Anchor
> Point\"","\"no\"","\"00AK\"","","\"00AK\"","",""] |
> +------------+
> 3 rows selected (0.097 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)