[ 
https://issues.apache.org/jira/browse/IMPALA-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-635:
----------------------------------
    Epic Link: IMPALA-12887

> Default value in Avro schema must match type of first union type
> ----------------------------------------------------------------
>
>                 Key: IMPALA-635
>                 URL: https://issues.apache.org/jira/browse/IMPALA-635
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 1.0.1, Impala 2.3.0
>            Reporter: Nong Li
>            Priority: Minor
>              Labels: avro, usability
>
> *SUMMARY*
> If a default value is provided for a union-type Avro field (i.e. a union of 
> "null" and some other type, since other unions are not supported by Impala), 
> the default value must match the first type in the union. Otherwise Impala 
> will return the following error when trying to query the table:
> {noformat}
> Failed to parse table schema: Invalid JSON integer in 
> json_t_to_avro_value_helper
> {noformat}
> For example, the following field definition will produce this error:
> {noformat}
> {"name": "i", "type": ["int", "null"], "default": null}
> {noformat}
> This is technically not a bug since this is what the Avro spec dictates. 
> However, it isn't very user-friendly.
> *WORKAROUND*
> Switch the order of the types in the union before writing the files. If you 
> have existing files written with a problematic schema, you may need to 
> rewrite those files with the fixed schema because Avro embeds the schema in 
> the file.
> For example, the following field definition can be queried successfully:
> {noformat}
> {"name": "i", "type": ["null", "int"], "default": null}
> {noformat}
> *Original description*
> I have an Avro backed table. HIVE and the avro tools jar can read the files 
> and IMPALA can describe the table. However selecting from the table in IMPALA 
> causes the several deamons to crash?
> I1021 11:01:18.022570  8623 status.cc:44] Failed to parse file schema: 
> Invalid JSON float in json_t_to_avro_value_helper
>     @           0x83af7d  (unknown)
>     @           0x922a00  (unknown)
>     @           0x92309b  (unknown)
>     @           0x95e44d  (unknown)
>     @           0x910a8f  (unknown)
>     @           0x90a680  (unknown)
>     @           0x9a36c4  (unknown)
>     @       0x3681c07851  (unknown)
>     @       0x36818e811d  (unknown)
> I1021 11:01:18.030833  5229 progress-updater.cc:56] Query 
> 9c4f2e4eebf1c7a9:811b8dc272d75e8a: 6% Complete (1951 out of 29457)
> My schema is
> {
>   "type" : "record",
>   "name" : "points", 
>   "fields" : [ {
>     "name" : "c1",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c2",
>     "type" : [ "string", "null" ],
>     "default" : null
>   }, {
>     "name" : "c3",
>     "type" : [ "string", "null" ],
>     "default" : null
>   }, {
>     "name" : "c4",
>     "type" : [ "string", "null" ],
>     "default" : null
>   }, {
>     "name" : "c5",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c6",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c7",
>     "type" : [ "string", "null" ],
>     "default" : null
>   }, {
>     "name" : "c8",
>     "type" : [ "string", "null" ],
>     "default" : null
>   }, {
>     "name" : "c9",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c10",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c11",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c12",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c13",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c14",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c15",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c16",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c17",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "c18",
>     "type" : [ "double", "null" ],
>     "default" : null
>   }, {
>     "name" : "id1",
>     "type" : "int"
>   }, {
>     "name" : "id2",
>     "type" : "int"
>   }, {
>     "name" : "root_id",
>     "type" : "string"
>   } ]
> }
> Describing table in impala works, the table is partition by columns not in 
> the avro files (flume creates the directories).
> Query: describe points
> Query finished, fetching results ...
> +----------------------------+--------+-------------------+
> | name                       | type   | comment           |
> +----------------------------+--------+-------------------+
> | c1| double | from deserializer |
> | c2| string | from deserializer |
> | c3| string | from deserializer |
> | c4| string | from deserializer |
> | c5| double | from deserializer |
> | c6| double | from deserializer |
> | c7| string | from deserializer |
> | c8| string | from deserializer |
> | c9| double | from deserializer |
> | c10| double | from deserializer |
> | c11| double | from deserializer |
> | c12| double | from deserializer |
> | c13| double | from deserializer |
> | c14| double | from deserializer |
> | c15| double | from deserializer |
> | c16| double | from deserializer |
> | c17| double | from deserializer |
> | c18| double | from deserializer |
> | id1| int    | from deserializer |
> | id2| int    | from deserializer |
> | root_id                    | string | from deserializer |
> | deployment                 | string |                   |
> | date_id                    | int    |                   |
> | hour                       | int    |                   |
> | q_strategy                 | string |                   |
> | q_fund                     | string |                   |
> | q_expiry                   | string |                   |
> +----------------------------+--------+-------------------+
> Returned 27 row(s) in 29.33s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to