[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3353:
-----------------------------------
    Assignee: Hanifi Gunes  (was: Steven Phillips)

> Non data-type related schema changes errors
> -------------------------------------------
>
>                 Key: DRILL-3353
>                 URL: https://issues.apache.org/jira/browse/DRILL-3353
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.0.0
>            Reporter: Oscar Bernal
>            Assignee: Hanifi Gunes
>             Fix For: 1.2.0
>
>         Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip
>
>
> I'm having trouble querying a data set with varying schema for a nested 
> object fields. The majority of my data for a specific type of record has the 
> following nested data:
> {code}
> "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}}
> {code}
> Among those records (hundreds of them) I have only two with a slightly 
> different schema:
> {code}
> "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}}
> {code}
> When trying to query the "new" fields, my queries fail:
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
> 'Teste-FB-Engagement-Puro-iOS-230615';
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Teste-FB-Engagement-Puro-iOS-230615"
> Fragment 0:0
> [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
> ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
> when you are using a ValueWriter of type NullableVarCharWriterImpl.
> File  file.json
> Record  35
> Fragment 0:0
> [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
> ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
> {noformat}
> If I try to extract all "attributes" from those events, Drill will only 
> return a subset of the fields, ignoring the others. 
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from 
> `dfs`.`root`.`/file.json` as log where log.si = 
> '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
> +----------------------------------------------------+
> |                       EXPR$0                       |
> +----------------------------------------------------+
> | {"logged":"no","wearable":"no","type":"xxxx"}   |
> | {"logged":"no","wearable":"no","type":"xxxx"}  |
> | {"logged":"no","wearable":"no","type":"xxxx"}  |
> | {"logged":"no","wearable":"no","type":"xxxx"}    |
> | {"logged":"no","wearable":"no","type":"xxxx"}   |
> +----------------------------------------------------+
> {noformat}
> What I find strange is that I have thousands of records in the same file with 
> different schema for different record types and all other queries seem run 
> well.
> Is there something about how Drill infers schema that I might be missing 
> here? Does it infer based on a sample % of the data and fail for records that 
> were not taken into account while inferring schema? I suspect I wouldn't have 
> this error if I had 100's of records with that other schema inside the file, 
> but I can't find anything in the docs or code to support that hypothesis. 
> Perhaps it's just a bug? Is it expected?
> Troubleshooting guide seems to mention something about this but it's very 
> vague in implying Drill doesn't fully support schema changes. I thought that 
> was for data type changes mostly, for which there are other well documented 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to