[ 
https://issues.apache.org/jira/browse/HIVE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wang updated HIVE-9940:
----------------------------
    Description: 
use HQL statement like:
FROM (
  select_statement
  ) map_output
INSERT OVERWRITE TABLE table
  REDUCE map_output.a, map_output.b
  USING 'py_script'
  AS col1, col2;

(1)original type
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  Meerjungfrau    25      AO DE   20140704
...

(type of each column are: string, string, int, string, string)

Hive interprets these as:
527500  Meer    <null>  AO DE   20140704
...

stderr_log interprets these as:
527500  Meerjungfrau    25      AO DE   20140704

(2)change all 'Meerjungfrau' to 'bug' in Python script
stdout of Python has Records where the 2nd column = 'bug'
527500  bug     25      AO DE   20140704
...

Hive interprets these as:
527500  b       <null>  AO DE   20140704
...

stderr_log interprets these as:
527500  bug     25      AO DE   20140704

(3)put 2nd column to the last column
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  25      AO DE   20140704        Meerjungfrau
...

Hive interprets these as:
527500  25      <null>  20140704        Meerjungfrau
...

stderr_log interprets these as:
527500  25      AO DE   20140704        Meerjungfrau

  was:
use HQL statement like:
FROM (
  select_statement
  ) map_output
INSERT OVERWRITE TABLE table
  REDUCE map_output.a, map_output.b
  USING 'py_script'
  AS col1, col2;

(1)original type
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  Meerjungfrau    25      AO DE   20140704
...

Hive interprets these as:
527500  Meer    <null>  AO DE   20140704
...

stderr_log interprets these as:
527500  Meerjungfrau    25      AO DE   20140704

(2)change all 'Meerjungfrau' to 'bug' in Python script
stdout of Python has Records where the 2nd column = 'bug'
527500  bug     25      AO DE   20140704
...

Hive interprets these as:
527500  b       <null>  AO DE   20140704
...

stderr_log interprets these as:
527500  bug     25      AO DE   20140704

(3)put 2nd column to the last column
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  25      AO DE   20140704        Meerjungfrau
...

Hive interprets these as:
527500  25      <null>  20140704        Meerjungfrau
...

stderr_log interprets these as:
527500  25      AO DE   20140704        Meerjungfrau


> The standard output of Python reduce script can not be interpreted correctly 
> by Hive
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-9940
>                 URL: https://issues.apache.org/jira/browse/HIVE-9940
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Eric Wang
>
> use HQL statement like:
> FROM (
>   select_statement
>   ) map_output
> INSERT OVERWRITE TABLE table
>   REDUCE map_output.a, map_output.b
>   USING 'py_script'
>   AS col1, col2;
> (1)original type
> stdout of Python has Records where the 2nd column = 'Meerjungfrau'
> 527500        Meerjungfrau    25      AO DE   20140704
> ...
> (type of each column are: string, string, int, string, string)
> Hive interprets these as:
> 527500        Meer    <null>  AO DE   20140704
> ...
> stderr_log interprets these as:
> 527500        Meerjungfrau    25      AO DE   20140704
> (2)change all 'Meerjungfrau' to 'bug' in Python script
> stdout of Python has Records where the 2nd column = 'bug'
> 527500        bug     25      AO DE   20140704
> ...
> Hive interprets these as:
> 527500        b       <null>  AO DE   20140704
> ...
> stderr_log interprets these as:
> 527500        bug     25      AO DE   20140704
> (3)put 2nd column to the last column
> stdout of Python has Records where the 2nd column = 'Meerjungfrau'
> 527500        25      AO DE   20140704        Meerjungfrau
> ...
> Hive interprets these as:
> 527500        25      <null>  20140704        Meerjungfrau
> ...
> stderr_log interprets these as:
> 527500        25      AO DE   20140704        Meerjungfrau



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to