[
https://issues.apache.org/jira/browse/HIVE-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheng Shao reassigned HIVE-51:
------------------------------
Assignee: Zheng Shao
> Generate and accept JSON as the input-output format from mappers and reducers
> -----------------------------------------------------------------------------
>
> Key: HIVE-51
> URL: https://issues.apache.org/jira/browse/HIVE-51
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Venky Iyer
> Assignee: Zheng Shao
>
> set mapred.data.format=JSON;
> ....
> MAP USING 'python filter.py'
> ....;
> would mean that filter.py would receive a JSON formatted dictionary of the
> columns instead of a tab-delimited string.
> { column1: value1, column2: [1,2,3] } etc
> It would in turn produce JSON.
> This should be done so that the JSON is not transmitted back and forth over
> the network; it would be generated on the fly on the mapper node, and
> converted back to the standard format used (tab-delimited, I assume).
> This seems like the simplest way for encoding type information in the input
> to mappers; it would also remove the need for silly boilerplate code that
> took a list of expected input column names, took the input stream, split it
> up, and made a dictionary of {column name: value} on every record.
> Output schemas (USING '' AS ...) might also be redundant with this in place,
> but I'm not sure if that is doable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.