Eric Wang created HIVE-9940:
-------------------------------

             Summary: The standard output of Python reduce script can not be 
interpreted correctly by Hive
                 Key: HIVE-9940
                 URL: https://issues.apache.org/jira/browse/HIVE-9940
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Eric Wang


use HQL statement like:
FROM (
  select_statement
  ) map_output
INSERT OVERWRITE TABLE table
  REDUCE map_output.a, map_output.b
  USING 'py_script'
  AS col1, col2;

(1)original type
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  Meerjungfrau    25      AO DE   20140704
...

Hive interprets these as:
527500  Meer    <null>  AO DE   20140704
...

stderr_log interprets these as:
527500  Meerjungfrau    25      AO DE   20140704

(2)change all 'Meerjungfrau' to 'bug' in Python script
stdout of Python has Records where the 2nd column = 'bug'
527500  bug     25      AO DE   20140704
...

Hive interprets these as:
527500  b       <null>  AO DE   20140704
...

stderr_log interprets these as:
527500  bug     25      AO DE   20140704

(3)put 2nd column to the last column
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  25      AO DE   20140704        Meerjungfrau
...

Hive interprets these as:
527500  25      <null>  20140704        Meerjungfrau
...

stderr_log interprets these as:
527500  25      AO DE   20140704        Meerjungfrau



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to