[ https://issues.apache.org/jira/browse/HIVE-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ning Zhang reassigned HIVE-51: ------------------------------ Assignee: (was: Ning Zhang) > Generate and accept JSON as the input-output format from mappers and reducers > ----------------------------------------------------------------------------- > > Key: HIVE-51 > URL: https://issues.apache.org/jira/browse/HIVE-51 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Venky Iyer > > set mapred.data.format=JSON; > .... > MAP USING 'python filter.py' > ....; > would mean that filter.py would receive a JSON formatted dictionary of the > columns instead of a tab-delimited string. > { column1: value1, column2: [1,2,3] } etc > It would in turn produce JSON. > This should be done so that the JSON is not transmitted back and forth over > the network; it would be generated on the fly on the mapper node, and > converted back to the standard format used (tab-delimited, I assume). > This seems like the simplest way for encoding type information in the input > to mappers; it would also remove the need for silly boilerplate code that > took a list of expected input column names, took the input stream, split it > up, and made a dictionary of {column name: value} on every record. > Output schemas (USING '' AS ...) might also be redundant with this in place, > but I'm not sure if that is doable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.