[
https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
eugeny birukov updated HIVE-11373:
----------------------------------
Description:
I try transform json string to Map<STRING,STRING> using python code
import sys,re
for d in sys.stdin:
r=d.replace('{','').replace('}','').replace('"','')
r=re.sub('[:,]', '\003', r)
print r.strip()
Steps for reproduce:
echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;
hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath
'/tmp/json.txt' overwrite into table json;"
hive -e "SELECT TRANSFORM (jsonStr) USING
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP<STRING,
STRING>) FROM json;"
converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL =
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write:
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)
Expected Result {"key1":"valu1","key2":"value2"}
Actual Result {"key1":"valu1\u0003key2\u0003value2"}
was:
I try transform json string to Map<STRING,STRING> using python code
import sys,re
for d in sys.stdin:
r=d.replace('{','').replace('}','').replace('"','')
r=re.sub('[:,]', '\003', r)
print r.strip()
echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;
hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath
'/tmp/json.txt' overwrite into table json;"
hive -e "SELECT TRANSFORM (jsonStr) USING
's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP<STRING,
STRING>) FROM json;"
converting to local s3://webgames-emr/hive/restore/json2map.py
Added resources: [s3://webgames-emr/hive/restore/json2map.py]
Query ID = hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437833808701_0006, Tracking URL =
http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0%
2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96
sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1437833808701_0006
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write:
25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
{"key1":"valu1\u0003key2\u0003value2"}
Time taken: 48.878 seconds, Fetched: 1 row(s)
Expected Result {"key1":"valu1","key2":"value2"}
Actual Result {"key1":"valu1\u0003key2\u0003value2"}
> Incorrect (de)serialization STRING field to MAP<STRING,STRING> in TRANSFORM
> operation
> --------------------------------------------------------------------------------------
>
> Key: HIVE-11373
> URL: https://issues.apache.org/jira/browse/HIVE-11373
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 0.13.1, 1.0.0
> Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with
> HIVE 1.0)
> Reporter: eugeny birukov
>
> I try transform json string to Map<STRING,STRING> using python code
> import sys,re
> for d in sys.stdin:
> r=d.replace('{','').replace('}','').replace('"','')
> r=re.sub('[:,]', '\003', r)
> print r.strip()
> Steps for reproduce:
> echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt;
> hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath
> '/tmp/json.txt' overwrite into table json;"
> hive -e "SELECT TRANSFORM (jsonStr) USING
> 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP<STRING,
> STRING>) FROM json;"
> converting to local s3://webgames-emr/hive/restore/json2map.py
> Added resources: [s3://webgames-emr/hive/restore/json2map.py]
> Query ID = hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1437833808701_0006, Tracking URL =
> http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/
> Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0%
> 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96
> sec
> MapReduce Total cumulative CPU time: 1 seconds 960 msec
> Ended Job = job_1437833808701_0006
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write:
> 25 SUCCESS
> Total MapReduce CPU Time Spent: 1 seconds 960 msec
> OK
> {"key1":"valu1\u0003key2\u0003value2"}
> Time taken: 48.878 seconds, Fetched: 1 row(s)
> Expected Result {"key1":"valu1","key2":"value2"}
> Actual Result {"key1":"valu1\u0003key2\u0003value2"}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)