[ https://issues.apache.org/jira/browse/HIVE-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110694#comment-14110694 ]
Hive QA commented on HIVE-7853: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664341/HIVE-7853.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6115 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/501/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/501/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-501/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12664341 > Make OrcNewInputFormat return row number as a key > ------------------------------------------------- > > Key: HIVE-7853 > URL: https://issues.apache.org/jira/browse/HIVE-7853 > Project: Hive > Issue Type: Bug > Components: File Formats > Affects Versions: 0.13.1 > Environment: all > Reporter: john > Assignee: Navis > Labels: Orc > Attachments: HIVE-7853.1.patch.txt > > Original Estimate: 24h > Remaining Estimate: 24h > > Key is null in map when OrcNewInputFormat is used as Input Format Class > When using OrcNewInputFormat as input format class for my map reduce job, I > find its key is always null in my map method. This gives me no way to get row > number in my map method. If you compare RCFileInputFormat (for RC file), its > key in map method returns the row number so I know which row I am processing. > Is there any workaround for me to get the row number from my map method? Of > course, I can count the row number by myself. But that has two problems: #1 > I have to assume the row is coming in the order; #2 I will get duplicated > (and wrong) row numbers if a big input file causes multiple file splits > (which will trigger my map method multiple times in different data nodes). > At this point, I am really seeking a better way to get row number for each > processed row in map method. > Here is what I have in my map logs: > [2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper > Input Key: (null) > [2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper > Input Value: {Q81510000, T99760000, 699760000, 81567560000, 9667981610000, > 978989898980000, Laura, laura...@gmail.com} > My map method is: > protected void map(Object key, Writable value, Context context) > throws IOException, InterruptedException { > logger.debug("Mapper Input Key: " + key); > logger.debug("Mapper Input Value: " + value.toString()); > ..... > } > The fix should be: add following statement in nextKeyValue() method and pass > the result all the way up to the map() method as its key: > reader.getRowNumber(); -- This message was sent by Atlassian JIRA (v6.2#6252)