[ 
https://issues.apache.org/jira/browse/HIVE-13708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282291#comment-15282291
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13708:
----------------------------------------------------------

[~thejas] I checked whether we could do this in a generic way. As you 
mentioned, we can perform a deep check of the object inspector after 
initialize() and see if the types will match the column type in the table 
definition.  My concern here is if it is backward compatible or will it break 
things that used to work previously. If we haven't enforced this rule 
previously, how will we expect the custom serde developer henceforth to know 
that this is an enforced rule in Hive. Also, it looked cleaner to implement 
this check in the actual serde itself (like for e.g. RegexSerDe has done a 
similar check in initialize()) since it seems that it is the responsibility of 
the Serde to interpret the data correctly and not the query processor. Let me 
know your feedback.

Thanks
Hari

> Create table should verify datatypes supported by the serde
> -----------------------------------------------------------
>
>                 Key: HIVE-13708
>                 URL: https://issues.apache.org/jira/browse/HIVE-13708
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Thejas M Nair
>            Assignee: Hari Sankar Sivarama Subramaniyan
>            Priority: Critical
>         Attachments: HIVE-13708.1.patch
>
>
> As [~Goldshuv] mentioned in HIVE-7777.
> Create table with serde such as OpenCSVSerde allows for creation of table 
> with columns of arbitrary types. But 'describe table' would still return 
> string datatypes, and so does selects on the table.
> This is misleading and would result in users not getting intended results.
> The create table ideally should disallow the creation of such tables with 
> unsupported types.
> Example posted by [~Goldshuv] in HIVE-7777 -
> {noformat}
> CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) 
> ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with 
> serdeproperties ("separatorChar" = ",","quoteChar"= "'","escapeChar"= "\\") 
> STORED AS TEXTFILE 
> LOCATION '<some location>' 
> tblproperties ("skip.header.line.count"="1");
> {noformat}
> Now consider this sql:
> hive> select min(totalprice) from test;
> in this case given my data, the result should have been 874.89, but the 
> actual result became 100001.57 (as it is first according to byte ordering of 
> a string type). this is a wrong result.
> hive> desc extended test;
> OK
> o_totalprice          string                  from deserializer
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to