[
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015291#comment-16015291
]
Remus Rusanu commented on HIVE-16667:
-------------------------------------
[~ngangam] Thanks! I Can you please run the repro I attached originally?
{code}
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT
'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO TABLE
srcpart PARTITION (ds="2008-04-09", hr="11");
select * from srcpart;
{code}
There is a reason for this specific repro. If simply look at any CLOB field,
like {{TABLE_PARAMS.PARAM_VALUE}}, then this field may well be loaded by JDO,
via the ObjectStore. JDO knows how to handle this field appropriately. But my
repro triggers a code path which goes through the
[{{MetasoreDirectSql.getPartitionsFromPartitionIds}}](https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L787):
{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
// We are skipping the CDS table here, as it seems to be totally useless.
queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\",
\"TYPE_NAME\""
+ " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and
\"INTEGER_IDX\" >= 0"
+ " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
loopJoinOrderedResult(colss, queryText, 0, new
ApplyFunc<List<FieldSchema>>() {
@Override
public void apply(List<FieldSchema> t, Object[] fields) {
t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]),
(String)fields[1]));
}});
}
{code}
This particular code is the one I'm reporting the problem on. For me, this does
not handle Clobs appropriately and reads the lob handle value instead of the
lob content.
> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and
> other field is incorrect
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
> Issue Type: Bug
> Reporter: Remus Rusanu
> Assignee: Naveen Gangam
> Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as
> an INT, into the table. SELECTs return the INT value, which should had been
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier
> metastore versions (they retain their string storage) vs. values inserted
> after the upgrade (inserted as LOB roots).
> Teh code in
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does
> not happen, the value is a Java String containing the int which is the LOB
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException:
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but
> that is just the luck of the path taken by the code. Inspection of my PG
> metastore shows all the CLOB fields suffering from this issue.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)