[
https://issues.apache.org/jira/browse/IMPALA-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-635:
----------------------------------
Epic Link: IMPALA-12887
> Default value in Avro schema must match type of first union type
> ----------------------------------------------------------------
>
> Key: IMPALA-635
> URL: https://issues.apache.org/jira/browse/IMPALA-635
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 1.0.1, Impala 2.3.0
> Reporter: Nong Li
> Priority: Minor
> Labels: avro, usability
>
> *SUMMARY*
> If a default value is provided for a union-type Avro field (i.e. a union of
> "null" and some other type, since other unions are not supported by Impala),
> the default value must match the first type in the union. Otherwise Impala
> will return the following error when trying to query the table:
> {noformat}
> Failed to parse table schema: Invalid JSON integer in
> json_t_to_avro_value_helper
> {noformat}
> For example, the following field definition will produce this error:
> {noformat}
> {"name": "i", "type": ["int", "null"], "default": null}
> {noformat}
> This is technically not a bug since this is what the Avro spec dictates.
> However, it isn't very user-friendly.
> *WORKAROUND*
> Switch the order of the types in the union before writing the files. If you
> have existing files written with a problematic schema, you may need to
> rewrite those files with the fixed schema because Avro embeds the schema in
> the file.
> For example, the following field definition can be queried successfully:
> {noformat}
> {"name": "i", "type": ["null", "int"], "default": null}
> {noformat}
> *Original description*
> I have an Avro backed table. HIVE and the avro tools jar can read the files
> and IMPALA can describe the table. However selecting from the table in IMPALA
> causes the several deamons to crash?
> I1021 11:01:18.022570 8623 status.cc:44] Failed to parse file schema:
> Invalid JSON float in json_t_to_avro_value_helper
> @ 0x83af7d (unknown)
> @ 0x922a00 (unknown)
> @ 0x92309b (unknown)
> @ 0x95e44d (unknown)
> @ 0x910a8f (unknown)
> @ 0x90a680 (unknown)
> @ 0x9a36c4 (unknown)
> @ 0x3681c07851 (unknown)
> @ 0x36818e811d (unknown)
> I1021 11:01:18.030833 5229 progress-updater.cc:56] Query
> 9c4f2e4eebf1c7a9:811b8dc272d75e8a: 6% Complete (1951 out of 29457)
> My schema is
> {
> "type" : "record",
> "name" : "points",
> "fields" : [ {
> "name" : "c1",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c2",
> "type" : [ "string", "null" ],
> "default" : null
> }, {
> "name" : "c3",
> "type" : [ "string", "null" ],
> "default" : null
> }, {
> "name" : "c4",
> "type" : [ "string", "null" ],
> "default" : null
> }, {
> "name" : "c5",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c6",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c7",
> "type" : [ "string", "null" ],
> "default" : null
> }, {
> "name" : "c8",
> "type" : [ "string", "null" ],
> "default" : null
> }, {
> "name" : "c9",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c10",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c11",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c12",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c13",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c14",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c15",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c16",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c17",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "c18",
> "type" : [ "double", "null" ],
> "default" : null
> }, {
> "name" : "id1",
> "type" : "int"
> }, {
> "name" : "id2",
> "type" : "int"
> }, {
> "name" : "root_id",
> "type" : "string"
> } ]
> }
> Describing table in impala works, the table is partition by columns not in
> the avro files (flume creates the directories).
> Query: describe points
> Query finished, fetching results ...
> +----------------------------+--------+-------------------+
> | name | type | comment |
> +----------------------------+--------+-------------------+
> | c1| double | from deserializer |
> | c2| string | from deserializer |
> | c3| string | from deserializer |
> | c4| string | from deserializer |
> | c5| double | from deserializer |
> | c6| double | from deserializer |
> | c7| string | from deserializer |
> | c8| string | from deserializer |
> | c9| double | from deserializer |
> | c10| double | from deserializer |
> | c11| double | from deserializer |
> | c12| double | from deserializer |
> | c13| double | from deserializer |
> | c14| double | from deserializer |
> | c15| double | from deserializer |
> | c16| double | from deserializer |
> | c17| double | from deserializer |
> | c18| double | from deserializer |
> | id1| int | from deserializer |
> | id2| int | from deserializer |
> | root_id | string | from deserializer |
> | deployment | string | |
> | date_id | int | |
> | hour | int | |
> | q_strategy | string | |
> | q_fund | string | |
> | q_expiry | string | |
> +----------------------------+--------+-------------------+
> Returned 27 row(s) in 29.33s
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]