[
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017812#comment-16017812
]
Naveen Gangam commented on HIVE-16667:
--------------------------------------
I have tried a couple of means to force PosgresSQL to store the values inline
instead of TOASTed values via OIDs. According to their documentation, setting
storage to {{PLAIN || MAIN}} should force the DB to store them in-line.
{code}
The TOAST code recognizes four different strategies for storing TOAST-able
columns:
PLAIN prevents either compression or out-of-line storage; furthermore it
disables use of single-byte headers for varlena types. This is the only
possible strategy for columns of non-TOAST-able data types.
EXTENDED allows both compression and out-of-line storage. This is the default
for most TOAST-able data types. Compression will be attempted first, then
out-of-line storage if the row is still too big.
EXTERNAL allows out-of-line storage but not compression. Use of EXTERNAL will
make substring operations on wide text and bytea columns faster (at the penalty
of increased storage space) because these operations are optimized to fetch
only the required parts of the out-of-line value when it is not compressed.
MAIN allows compression but not out-of-line storage. (Actually, out-of-line
storage will still be performed for such columns, but only as a last resort
when there is no other way to make the row small enough.)
{code}
Even with either of the setting, I cannot get it to store these values in-line.
I am still researching .. based on a hint in a usergroups, I found this
{code}
select * from "COLUMNS_V2";
CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX
-------+---------+-------------+-----------+-------------
1 | default | key | 27118 | 0
1 | default | value | 27119 | 1
(2 rows)
select "CD_ID", "COMMENT", "COLUMN_NAME",
convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int),
'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX"
>= 0 order by "CD_ID" asc, "INTEGER_IDX" asc;
CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME
-------+---------+-------------+-----------
1 | default | key | string
1 | default | value | string
{code}
While the above conversion works fine in native client, the same does not work
via hive/JDO. However, the overall request to getPartitions succeeds because it
falls back to using datanucleus when DirectSQL fails.
{{2017-05-19T11:24:46,604 WARN [pool-7-thread-2] metastore.MetaStoreDirectSql:
Getting partitions:query=select "CD_ID", "COMMENT", "COLUMN_NAME",
convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int),
'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX"
>= 0 order by "CD_ID" asc, "INTEGER_IDX" asc
2017-05-19T11:24:46,605 WARN [pool-7-thread-2] metastore.ObjectStore: Falling
back to ORM path due to direct SQL failure (this is not an error): SQL query
"select "CD_ID", "COMMENT", "COLUMN_NAME",
convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int),
'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX"
>= 0 order by "CD_ID" asc, "INTEGER_IDX" asc" requires 3 parameters yet none
have been supplied at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:636)
at }}
> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and
> other field is incorrect
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
> Issue Type: Bug
> Reporter: Remus Rusanu
> Assignee: Naveen Gangam
> Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as
> an INT, into the table. SELECTs return the INT value, which should had been
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier
> metastore versions (they retain their string storage) vs. values inserted
> after the upgrade (inserted as LOB roots).
> Teh code in
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does
> not happen, the value is a Java String containing the int which is the LOB
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException:
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but
> that is just the luck of the path taken by the code. Inspection of my PG
> metastore shows all the CLOB fields suffering from this issue.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)