[
https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881860#comment-16881860
]
Laszlo Bodor commented on HIVE-21940:
-------------------------------------
it has been tested on postgres locally and oracle on a cluster, could you
please review [~kgyrtkirk], [~vgarg] ?
> Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE
> ---------------------------------------------------------------------------
>
> Key: HIVE-21940
> URL: https://issues.apache.org/jira/browse/HIVE-21940
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Laszlo Bodor
> Assignee: Laszlo Bodor
> Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21940.01.patch, HIVE-21940.01.patch,
> HIVE-21940.repro.patch
>
>
> Summary: this is a fix for a regression introduced by HIVE-20833/HIVE-20221,
> fixed in the same way as HIVE-16667 earlier
> This issue was found while running sqoop/hive tests on a cluster with hive
> with postgres metastore, and it turned out the problem is that PARAM_VALUE is
> handled as it was CLOB but it's a text, so extractSqlClob returns it as is.
> It's reproducible on cluster by the following statements:
> {code}
> USE default;
> drop table if exists my_table;
> create external table my_table (col1 int, col3 int) partitioned by (col2
> string) STORED AS TEXTFILE;
> insert into my_table VALUES(11,201,"F");
> SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id =
> pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name =
> "my_table";
> {code}
> sys query results in:
> {code}
> +-------------+------------------------+-----------------+
> | pp.part_id | pp.param_key | pp.param_value |
> +-------------+------------------------+-----------------+
> | 151 | rawDataSize | 28629 |
> | 151 | numRows | 28628 |
> | 151 | transient_lastDdlTime | 28627 |
> | 151 | COLUMN_STATS_ACCURATE | 28626 |
> | 151 | numFiles | 28625 |
> | 151 | totalSize | 28622 |
> +-------------+------------------------+-----------------+
> {code}
> Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience
> while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no
> such type as CLOB, and metastore simply saves large object ids into this
> field.
> In case of a direct metastore query (from hive's sys schema, but the same
> result for direct postgres), it shows the result above (see sys query
> output). This is an issue when hive treats these ids as they were real
> values, but they are obviously not correct, and this causes various failures
> (e.g. using serde parameter serialization.format=28392)
> param_value values above are large object ids, according to pg_dump
> | 151 | COLUMN_STATS_ACCURATE | 28626 |
> {code}
> SELECT pg_catalog.lo_open('28626', 131072);
> SELECT pg_catalog.lowrite(0,
> '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d');
> SELECT pg_catalog.lo_close(0);
> {code}
> decoded large object value:
> {code}
> {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}}
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)