[
https://issues.apache.org/jira/browse/HIVE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Wang updated HIVE-9940:
----------------------------
Description:
use HQL statement like:
FROM (
select_statement
) map_output
INSERT OVERWRITE TABLE table
REDUCE map_output.a, map_output.b
USING 'py_script'
AS col1, col2;
(1)original type
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500 Meerjungfrau 25 AO DE 20140704
...
(type of each column are: string, string, int, string, string)
Hive interprets these as:
527500 Meer <null> AO DE 20140704
...
stderr_log interprets these as:
527500 Meerjungfrau 25 AO DE 20140704
(2)change all 'Meerjungfrau' to 'bug' in Python script
stdout of Python has Records where the 2nd column = 'bug'
527500 bug 25 AO DE 20140704
...
Hive interprets these as:
527500 b <null> AO DE 20140704
...
stderr_log interprets these as:
527500 bug 25 AO DE 20140704
(3)put 2nd column to the last column
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500 25 AO DE 20140704 Meerjungfrau
...
Hive interprets these as:
527500 25 <null> 20140704 Meerjungfrau
...
stderr_log interprets these as:
527500 25 AO DE 20140704 Meerjungfrau
was:
use HQL statement like:
FROM (
select_statement
) map_output
INSERT OVERWRITE TABLE table
REDUCE map_output.a, map_output.b
USING 'py_script'
AS col1, col2;
(1)original type
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500 Meerjungfrau 25 AO DE 20140704
...
Hive interprets these as:
527500 Meer <null> AO DE 20140704
...
stderr_log interprets these as:
527500 Meerjungfrau 25 AO DE 20140704
(2)change all 'Meerjungfrau' to 'bug' in Python script
stdout of Python has Records where the 2nd column = 'bug'
527500 bug 25 AO DE 20140704
...
Hive interprets these as:
527500 b <null> AO DE 20140704
...
stderr_log interprets these as:
527500 bug 25 AO DE 20140704
(3)put 2nd column to the last column
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500 25 AO DE 20140704 Meerjungfrau
...
Hive interprets these as:
527500 25 <null> 20140704 Meerjungfrau
...
stderr_log interprets these as:
527500 25 AO DE 20140704 Meerjungfrau
> The standard output of Python reduce script can not be interpreted correctly
> by Hive
> ------------------------------------------------------------------------------------
>
> Key: HIVE-9940
> URL: https://issues.apache.org/jira/browse/HIVE-9940
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Eric Wang
>
> use HQL statement like:
> FROM (
> select_statement
> ) map_output
> INSERT OVERWRITE TABLE table
> REDUCE map_output.a, map_output.b
> USING 'py_script'
> AS col1, col2;
> (1)original type
> stdout of Python has Records where the 2nd column = 'Meerjungfrau'
> 527500 Meerjungfrau 25 AO DE 20140704
> ...
> (type of each column are: string, string, int, string, string)
> Hive interprets these as:
> 527500 Meer <null> AO DE 20140704
> ...
> stderr_log interprets these as:
> 527500 Meerjungfrau 25 AO DE 20140704
> (2)change all 'Meerjungfrau' to 'bug' in Python script
> stdout of Python has Records where the 2nd column = 'bug'
> 527500 bug 25 AO DE 20140704
> ...
> Hive interprets these as:
> 527500 b <null> AO DE 20140704
> ...
> stderr_log interprets these as:
> 527500 bug 25 AO DE 20140704
> (3)put 2nd column to the last column
> stdout of Python has Records where the 2nd column = 'Meerjungfrau'
> 527500 25 AO DE 20140704 Meerjungfrau
> ...
> Hive interprets these as:
> 527500 25 <null> 20140704 Meerjungfrau
> ...
> stderr_log interprets these as:
> 527500 25 AO DE 20140704 Meerjungfrau
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)