[jira] Updated: (HIVE-1505) Support non-UTF8 data

Ted Xu (JIRA) Wed, 18 Aug 2010 23:19:43 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Xu updated HIVE-1505:
-------------------------

    Attachment: trunk-encoding.patch

We implemented encoding config feature on tables.
Set table encoding through serde parameter, for example:
{code}
alter table src set serdeproperties ('serialization.encoding'='GBK');
{code}
that makes table src using GBK encoding (Chinese encoding format). Further 
more, if using command line interface, parameter 'hive.cli.encoding' shall be 
set. 'hive.cli.encoding' must set before hive prompt started, so set 
'hive.cli.encoding' in hive-site.xml or using -hiveconf hive.cli.encoding=GBK 
in command line parameter, instead of 'set hive.cli.encoding=GBK' in hive ql.
Because of the reason above, I can't find a way to add a unit test.




> Support non-UTF8 data
> ---------------------
>
>                 Key: HIVE-1505
>                 URL: https://issues.apache.org/jira/browse/HIVE-1505
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: bc Wong
>         Attachments: trunk-encoding.patch
>
>
> I'd like to work with non-UTF8 data easily.
> Suppose I have data in latin1. Currently, doing a "select *" will return the 
> upper ascii characters in '\xef\xbf\xbd', which is the replacement character 
> '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different 
> encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1505) Support non-UTF8 data

Reply via email to