[jira] [Commented] (HIVE-7853) Make OrcNewInputFormat return row number as a key

Hive QA (JIRA) Tue, 26 Aug 2014 06:36:10 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110694#comment-14110694
 ]


Hive QA commented on HIVE-7853:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664341/HIVE-7853.1.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6115 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/501/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/501/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-501/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664341

> Make OrcNewInputFormat return row number as a key
> -------------------------------------------------
>
>                 Key: HIVE-7853
>                 URL: https://issues.apache.org/jira/browse/HIVE-7853
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 0.13.1
>         Environment: all
>            Reporter: john
>            Assignee: Navis
>              Labels: Orc
>         Attachments: HIVE-7853.1.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Key is null in map when OrcNewInputFormat is used as Input Format Class
> When using OrcNewInputFormat as input format class for my map reduce job, I 
> find its key is always null in my map method. This gives me no way to get row 
> number in my map method.  If you compare RCFileInputFormat (for RC file), its 
> key in map method returns the row number so I know which row I am processing. 
> Is there any workaround for me to get the row number from my map method?  Of 
> course, I can count the row number by myself.  But that has two problems: #1 
> I have to assume the row is coming in the order; #2 I will get duplicated 
> (and wrong) row numbers if a big input file causes multiple file splits 
> (which will trigger my map method multiple times in different data nodes).   
> At this point, I am really seeking a better way to get row number for each 
> processed row in map method.
> Here is what I have in my map logs:
>       [2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper 
> Input Key: (null)
>       [2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper 
> Input Value: {Q81510000, T99760000, 699760000, 81567560000, 9667981610000, 
> 978989898980000, Laura, [email protected]}
> My map method is:
>       protected void map(Object key, Writable value, Context context)
>                       throws IOException, InterruptedException {
>               logger.debug("Mapper Input Key: " + key);
>               logger.debug("Mapper Input Value: " + value.toString());
>               .....
>       }
> The fix should be: add  following statement in nextKeyValue() method and pass 
> the result all the way up to the map() method as its key:
>           reader.getRowNumber(); 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7853) Make OrcNewInputFormat return row number as a key

Reply via email to