[
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875631#comment-16875631
]
Paul Rogers edited comment on DRILL-7308 at 6/30/19 1:55 AM:
-------------------------------------------------------------
Modified the {{SchemaBuilder}} class to do exactly what I said we don't want to
do: it avoids setting the precision if the precision is zero. This allows the
(wrong) code in the REST feature to work. Still, the incorrect code should
change as explained above to avoid breaking the next time someone sets a
precision of 0.
Also removed the empty schema batch so that simple queries return just one
batch of data.
The result is that the broken code in the REST call should work for simple
one-batch queries. Nothing I can do, however, will fix the fact that the schema
will be repeated for every batch; fixing that will require changes to the REST
code itself.
was (Author: paul.rogers):
Modified the {{SchemaBuilder}} class to do exactly what I said we don't want to
do: it avoids setting the precision if the precision is zero. This allows the
(wrong) code in this feature to work. The incorrect code should change.
Also removed the empty schema batch so that simple queries return just one
batch of data.
The result is that the broken code in the REST call should work for simple
one-batch queries. Nothing I can do, however, will fix the fact that the schema
will be repeated for every batch; fixing that will require changes to the REST
code itself.
> Incorrect Metadata from text file queries
> -----------------------------------------
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
> Issue Type: Bug
> Components: Metadata
> Affects Versions: 1.17.0
> Reporter: Charles Givre
> Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill. If you
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
> "columns": [
> "domain"
> ],
> "rows": [}
> { "domain": "thedataist.com" } ],
> "metadata": [
> "VARCHAR(0, 0)",
> "VARCHAR(0, 0)"
> ],
> "queryState": "COMPLETED",
> "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1. VARCHAR now has precision
> 2. There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you
> get the following:
> {code:json}
> "rows": [
> {
> "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" }
> ],
> "metadata": [
> "VARCHAR(0, 0)",
> "VARCHAR(0, 0)"
> ],
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)