[ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694543#action_12694543 ]
He Yongqiang commented on HIVE-360: ----------------------------------- Thanks, Thusoo. Actually i am refactoring the code now. I have talked with Zheng about the current patch. There are some improvements: (1) make HiveInputFormat as an interface, and extends from InputFormat. Add a new getRecordWriter. The main different between its getRecordWriter and Hadoop OutputFormat's getRecordWriter is that the new getRecordWriter accepts a path parameter, and create the out file at the calling. (2) make HiveSequenceFileOutputFormat extend Hadoop's SequenceFileOutputFormat and implement the new HiveOutputFormat (3) Deprecate Hive's IgnoreKeyOutputFormat and replace it with a new IgnoreKeyOutputFormat which uses the new HiveOutputFormat In this way, the code will be more clear. The disadvantage is that the HiveOutputFormat's signature is like: {code} HiveOutputFormat extends OutputFormat<WritableComparable, Writable> {code} It can only use subclasses of WritableComparable as its key and subclasses of Writable as its value. I think it is ok in Hive, isn't it? Should i cancel the patch now and resubmit one once the refactory is done? > Generalize the FileFormat Interface in Hive > ------------------------------------------- > > Key: HIVE-360 > URL: https://issues.apache.org/jira/browse/HIVE-360 > Project: Hadoop Hive > Issue Type: Improvement > Affects Versions: 0.4.0 > Reporter: Zheng Shao > Assignee: He Yongqiang > Attachments: hive-360-2009-03-31.patch > > > Currently the FileFormat support in Hive is not generalized - we do "if ... > else" to support TextFileFormat and SequenceFileFormat. There is no way to > support a 3rd one without changing the "if...else" structure. We should make > an interface for the FileFormat need for Hive. > The OutputFileFormat interface that Hive requires will contain one more > method than the Hadoop OutputFileFormat - create a File with a specific name. > Hive.g:409 (Hive.g already supports the custom file format but > DDLSemanticAnalyzer.java is not recognizing it yet > {code} > KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT > outFmt=StringLiteral > {code} > Please add the handling of TOK_TABLEFILEFORMAT here: > DDLSemanticAnalyzer.java:223 > {code} > case HiveParser.TOK_TBLSEQUENCEFILE: > ... > {code} > Please add the handling of custom outputFormat here by adding a new interface > (and cast the user-provided file format to that interface), instead of doing > "if ... else" > FileSinkOperator.java:129-174: > {code} > if(outputFormat instanceof IgnoreKeyTextOutputFormat) { > finalPath = new Path(Utilities.toTempPath(conf.getDirName()), > Utilities.getTaskId(hconf) + > Utilities.getFileExtension(jc, isCompressed)); > ... > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.