[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

Naveen Gangam (JIRA) Fri, 19 May 2017 11:38:22 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017812#comment-16017812
 ]


Naveen Gangam commented on HIVE-16667:
--------------------------------------

I have tried a couple of means to force PosgresSQL to store the values inline 
instead of TOASTed values via OIDs.  According to their documentation, setting 
storage to {{PLAIN || MAIN}} should force the DB to store them in-line.

{code}
The TOAST code recognizes four different strategies for storing TOAST-able 
columns:
PLAIN prevents either compression or out-of-line storage; furthermore it 
disables use of single-byte headers for varlena types. This is the only 
possible strategy for columns of non-TOAST-able data types.

EXTENDED allows both compression and out-of-line storage. This is the default 
for most TOAST-able data types. Compression will be attempted first, then 
out-of-line storage if the row is still too big.

EXTERNAL allows out-of-line storage but not compression. Use of EXTERNAL will 
make substring operations on wide text and bytea columns faster (at the penalty 
of increased storage space) because these operations are optimized to fetch 
only the required parts of the out-of-line value when it is not compressed.

MAIN allows compression but not out-of-line storage. (Actually, out-of-line 
storage will still be performed for such columns, but only as a last resort 
when there is no other way to make the row small enough.)
{code}
Even with either of the setting, I cannot get it to store these values in-line.

I am still researching .. based on a hint in a usergroups, I found this
{code}
select * from "COLUMNS_V2";
CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX 
-------+---------+-------------+-----------+-------------
    1 | default | key        | 27118    |          0
    1 | default | value      | 27119    |          1
(2 rows)

select "CD_ID", "COMMENT", "COLUMN_NAME", 
convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int),  
'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX" 
>= 0 order by "CD_ID" asc, "INTEGER_IDX" asc;
CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME 
-------+---------+-------------+-----------
    1 | default | key        | string
    1 | default | value      | string
{code}

While the above conversion works fine in native client, the same does not work 
via hive/JDO. However, the overall request to getPartitions succeeds because it 
falls back to using datanucleus when DirectSQL fails.

{{2017-05-19T11:24:46,604  WARN [pool-7-thread-2] metastore.MetaStoreDirectSql: 
Getting partitions:query=select "CD_ID", "COMMENT", "COLUMN_NAME", 
convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int),  
'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX" 
>= 0 order by "CD_ID" asc, "INTEGER_IDX" asc

2017-05-19T11:24:46,605  WARN [pool-7-thread-2] metastore.ObjectStore: Falling 
back to ORM path due to direct SQL failure (this is not an error): SQL query 
"select "CD_ID", "COMMENT", "COLUMN_NAME", 
convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int),  
'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX" 
>= 0 order by "CD_ID" asc, "INTEGER_IDX" asc" requires 3 parameters yet none 
have been supplied at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:636)
 at }}



> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16667
>                 URL: https://issues.apache.org/jira/browse/HIVE-16667
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Remus Rusanu
>            Assignee: Naveen Gangam
>         Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

Reply via email to