[ 
https://issues.apache.org/jira/browse/HIVE-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1006:
--------------------------------

    Assignee: Dave Lerman

> getPartitionDescFromPath failing from CombineHiveInputFormat
> ------------------------------------------------------------
>
>                 Key: HIVE-1006
>                 URL: https://issues.apache.org/jira/browse/HIVE-1006
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.4.1
>            Reporter: Dave Lerman
>            Assignee: Dave Lerman
>         Attachments: hive.1006.1.patch, hive.1006.2.patch
>
>
> When HiveInputFormat.getPartitionDescFromPath is called from 
> CombineHiveInputFormat, it sometimes fails to return a matching partitionDesc 
> which then causes an Exception down the line since the split doesn't have an 
> inputFormatClassName.
> The issue is that the path format used as the key in pathToPartitionInfo 
> varies between stage - in the first stage it's the complete path as returned 
> from the table definitions (eg. hdfs://server/path), and then in subsequent 
> stages, it's the complete path with port (eg. hdfs://server:8020/path) of the 
> result of the previous stage.  This isn't a problem in HiveInputFormat since 
> the directory you're looking up always uses the same format as the keys, but 
> in CombineHiveInputFormat, we take that path and look up its children in the 
> file system to get all the block information, and then use one of the 
> returned paths to get the partition info -- and that returned path does not 
> include the port.  So, in any stage after the first, we are looking for a 
> path without the port, but all the keys in the map contain a port, so we 
> don't find a match.
> The attached patch may not be ideal -- it doesn't fix the underlying problem 
> of inconsistent path formats in pathToPartitionInfo -- it just works around 
> it by walking through the map and looking for a matching path rather than 
> doing a hash lookup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to