[
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015291#comment-16015291
]
Remus Rusanu edited comment on HIVE-16667 at 5/18/17 6:39 AM:
--------------------------------------------------------------
[~ngangam] Thanks! I Can you please run the repro I attached originally?
{code}
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT
'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO TABLE
srcpart PARTITION (ds="2008-04-09", hr="11");
select * from srcpart;
{code}
There is a reason for this specific repro. If simply look at any CLOB field,
like {{TABLE_PARAMS.PARAM_VALUE}}, then this field may well be loaded by JDO,
via the ObjectStore. JDO knows how to handle this field appropriately. But my
repro triggers a code path which goes through the
[MetasoreDirectSql.getPartitionsFromPartitionIds|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L787]:
{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
// We are skipping the CDS table here, as it seems to be totally useless.
queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\",
\"TYPE_NAME\""
+ " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and
\"INTEGER_IDX\" >= 0"
+ " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
loopJoinOrderedResult(colss, queryText, 0, new
ApplyFunc<List<FieldSchema>>() {
@Override
public void apply(List<FieldSchema> t, Object[] fields) {
t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]),
(String)fields[1]));
}});
}
{code}
This particular code is the one I'm reporting the problem on. For me, this does
not handle Clobs appropriately and reads the lob handle value instead of the
lob content.
was (Author: rusanu):
[~ngangam] Thanks! I Can you please run the repro I attached originally?
{code}
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT
'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO TABLE
srcpart PARTITION (ds="2008-04-09", hr="11");
select * from srcpart;
{code}
There is a reason for this specific repro. If simply look at any CLOB field,
like {{TABLE_PARAMS.PARAM_VALUE}}, then this field may well be loaded by JDO,
via the ObjectStore. JDO knows how to handle this field appropriately. But my
repro triggers a code path which goes through the
[{{MetasoreDirectSql.getPartitionsFromPartitionIds}}](https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L787):
{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
// We are skipping the CDS table here, as it seems to be totally useless.
queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\",
\"TYPE_NAME\""
+ " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and
\"INTEGER_IDX\" >= 0"
+ " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
loopJoinOrderedResult(colss, queryText, 0, new
ApplyFunc<List<FieldSchema>>() {
@Override
public void apply(List<FieldSchema> t, Object[] fields) {
t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]),
(String)fields[1]));
}});
}
{code}
This particular code is the one I'm reporting the problem on. For me, this does
not handle Clobs appropriately and reads the lob handle value instead of the
lob content.
> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and
> other field is incorrect
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
> Issue Type: Bug
> Reporter: Remus Rusanu
> Assignee: Naveen Gangam
> Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as
> an INT, into the table. SELECTs return the INT value, which should had been
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier
> metastore versions (they retain their string storage) vs. values inserted
> after the upgrade (inserted as LOB roots).
> Teh code in
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does
> not happen, the value is a Java String containing the int which is the LOB
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException:
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but
> that is just the luck of the path taken by the code. Inspection of my PG
> metastore shows all the CLOB fields suffering from this issue.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)