[
https://issues.apache.org/jira/browse/DRILL-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498849#comment-14498849
]
Steven Phillips commented on DRILL-2806:
----------------------------------------
Drill doesn't have a concept of unsupported extensions. The current behavior is
to treat all unknown extensions as a default format that can be configured for
a workspace.
Maybe what we need is the option to have no default format, in which case,
Drill will throw an exception if an unknow extension is used, and Drill is
unable to determine the format.
> Querying data from compressed csv file returns nulls and unreadable data
> ------------------------------------------------------------------------
>
> Key: DRILL-2806
> URL: https://issues.apache.org/jira/browse/DRILL-2806
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 0.9.0
> Environment: 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580:
> Exit early from HashJoinBatch if build side is empty | 26.03.2015
> Reporter: Khurram Faraaz
> Assignee: Steven Phillips
>
> Project columns from a compressed CSV data file returns unreadable data and
> nulls in the query results. Querying the same CSV file in uncompressed
> format, the query returns correct results, readable data and no nulls. Test
> was performed on 4 node cluster on CentOS.
> {code}
> 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3],
> columns[4], columns[5], columns[6], columns[7] from
> `deletions-00000-of-00020.tgz` limit 10;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5
> | EXPR$6 | EXPR$7 |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 0U[ˮȑ|axaR)ﺫ=鲍i̊HDJ|?3̑$%Q$%
> TdfD8'2i$E^/Y}C'>|/7
>
> H1o0! | 0g TMUܸW`ʙ&T
>
> \uXپN|2I~Y 0RAX6UaXe+ow*]s
> | null | null | null | null | null | null
> |
> | oM.ڻU/ | ̼\
> )qwda7((
> y[) |
> 9>^0>WM[{r]iE$ze&!EküIfa | null | null | null | null
> | null |
> | SR | null | null | null | null | null |
> null | null |
> |
> 6imJ\f_dYڿ]%ln3IaE*BGA-a$j:M!Uc)ﶘD~wUx0ɼgme]ӘcQ*pk$%\2ER-)(ÈxTn?SϓxeҜݠºI|'(Cni
> s | null | null | null | null | null |
> null | null |
> | bxΜkr4ü_nIxl_s`vN
> ó.$OL7Eބyڗia;Pu$M!AoCӦnlS-`ۢ+o~>%wzcgwtMge7"lMgZ=WྃgMRX1"a | X=Rd.fab{t{
>
>
> A!t
>
>
> 1$ڧw-0EXURg
>
>
> p
> #qzߤgWMem{=z{
>
>
>
> eiA]^ | null | null | null | null | null
> | null |
> | | null | null | null | null | null |
> null | null |
> | !{1H*m71`˰]oZ | ] &f4Z)4SP7Rm4^5WWXȧ<p.́3L
>
> q%|WL-p[ | null | null | null | null | null
> | null |
> | dqyd\K#"ԁ@ | null | null | null | null | null
> | null | null |
> | [GԊKFlɢ(ZK8h#D/[(U=_8ΏE%
> [;
> w}Fr`#Xk
>
> lT'15:y
>
> ņPz(-ȓCs)1v | null | null | null |
> null | null | null | null |
> | LyPO|Ώ(+n+H]
> Ņ2?糩s/_ l
> +ӯb | null | null
> | null | null | null | null | null |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 10 rows selected (0.176 seconds)
> 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3],
> columns[4], columns[5], columns[6], columns[7] from
> `deletions/deletions-00000-of-00020.csv` limit 10;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5
> | EXPR$6 | EXPR$7 |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 1354980518007 | /user/mwcl_musicbrainz | 1356247116000 |
> /user/google_gardener | /m/0nj707g | /music/track_contribution/contributor |
> /m/09xmq3 | en |
> | 1359609261000 | /user/ahsan2002us | 1359697206000 | /user/mjsigua |
> /m/0q47ym9 | /common/topic/description | Afrosheen CEO is the fictional
> character from the 2003 film The Watermelon Heist. | en |
> | 1258294630005 | /user/book_bot | 1260214155000 | /user/book_bot |
> /m/08g19rh | /book/book_edition/book | /m/04sty07 | en |
> | 1260232964000 | /user/book_bot | 1360880749000 | /user/turtlewax_bot |
> /m/0872_f2 | /book/book_edition/book | /m/069_gyc | en |
> | 1320298552000 | /user/gardening_bot | 1358083965004 | /user/googlebot |
> /m/01dy3t2 | /type/object/type | /music/single | en |
> | 1360430129006 | /user/mwcl_musicbrainz | 1362830875001 |
> /user/mwcl_musicbrainz | /m/0qm1x62 | /music/release_track/release |
> /m/0ql38vr | en |
> | 1269251105000 | /user/mwcl_images | 1336539194001 | /user/gardening_bot |
> /m/06w7yw7 | /common/topic/image | /m/0bcncxt | en |
> | 1225386250001 | /user/mwcl_images | 1336080683003 | /user/gardening_bot |
> /m/04sb526 | /common/licensed_object/license | /m/02x6b | en |
> | 1286991487000 | /user/mw_template_bot | 1362532733000 |
> /user/wikipedia_facts | /m/0dgs170 | /people/person/date_of_birth | 1975
> | en |
> | 1258986090000 | /user/book_bot | 1260138587000 | /user/book_bot |
> /m/08r_m33 | /book/book_edition/book | /m/04sty07 | en |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 10 rows selected (0.25 seconds)
> Details of the files (compressed and uncompressed)
> [root@centos-01 ~]# hadoop fs -ls /tmp/deletions-00000-of-00020.tgz
> -rwxr-xr-x 3 root root 111364147 2015-04-16 20:35
> /tmp/deletions-00000-of-00020.tgz
> [root@centos-01 ~]# hadoop fs -ls /tmp/deletions/deletions-00000-of-00020.csv
> -rwxr-xr-x 3 root root 395624293 2015-04-14 18:10
> /tmp/deletions/deletions-00000-of-00020.csv
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)