[ 
https://issues.apache.org/jira/browse/HIVE-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-3677:
--------------------------------------
    Fix Version/s:     (was: 0.8.1)

I cleared the fixVersion field since this ticket is still open. Please review 
this ticket and if the fix is already committed to a specific version please 
set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA 
guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] 
the fixVersion should be set only when the issue is resolved/closed.

> Encoding Issue - ISO-8859-1
> ---------------------------
>
>                 Key: HIVE-3677
>                 URL: https://issues.apache.org/jira/browse/HIVE-3677
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Import/Export
>    Affects Versions: 0.8.1
>         Environment: Amazon EMR with Hive (Hive 0.8.1 and haddop 1.0.3)
>            Reporter: Sergio Kameoka
>            Priority: Major
>
> We’ve created a very simple example using Amazon EMR with Hive which is 
> basically create a single table with Hive and load some data inside this 
> table. Below you’ll find the code that has been used:
> //CREATE TABLE CODE
> CREATE TABLE sampletable (
> valorstring STRING, valordecimal DOUBLE)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe'
> WITH SERDEPROPERTIES (
> 'serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol',
> 'quote.delim'='("|\\[|\\])',
> 'field.delim'=' ',
> 'serialization.null.format'='-')
> STORED AS TEXTFILE;
>  
> //LOAD DATA CODE
> LOAD DATA LOCAL INPATH '/tmp/sampletable.txt' OVERWRITE INTO TABLE 
> sampletable;
> Here is the text file content that we are using to load the data:
> /tmp/sampletable.txt
> "Exemplo de texto com acentuação" 90,15
> "Exemplo de texto com acentuação" 80.15
> The problem that we are facing seems to be with the enconding that is been 
> used in Hive configuration. Seems to me that it is been used UTF-8 but for 
> Brazilian format we’ll need to use ISO-8859-1.
> In the example above, when the data is loaded inside the table and we perform 
> a simple select (Select * from sampletable) the text with accentuation is 
> returned totally wrong and the double value with comma is returned as null.
> We’ve already changed the variable LANG in enviroment and Hive variables with 
> SET, but it doesn’t work so far.
> Thank you in advance!!!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to