[ 
https://issues.apache.org/jira/browse/HIVE-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906837#action_12906837
 ] 

He Yongqiang commented on HIVE-1610:
------------------------------------

Sammy, we can not fix this issue by just removing the schema check. 
If the input URI's path part is the same with one partition's path, but their 
schema is different, we should still return NULL.

For your case, the main problem is the port, which is contained in the 
partitionDesc but not in the input path.

Is it possible if we just ignore the port? I mean is there a case that two 
different instances share the same address but use different port?

> Using CombinedHiveInputFormat causes partToPartitionInfo IOException  
> ----------------------------------------------------------------------
>
>                 Key: HIVE-1610
>                 URL: https://issues.apache.org/jira/browse/HIVE-1610
>             Project: Hadoop Hive
>          Issue Type: Bug
>         Environment: Hadoop 0.20.2
>            Reporter: Sammy Yu
>         Attachments: 
> 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch, 
> 0003-HIVE-1610.patch
>
>
> I have a relatively complicated hive query using CombinedHiveInputFormat:
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true; 
> set hive.exec.max.dynamic.partitions=1000;
> set hive.exec.max.dynamic.partitions.pernode=300;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select 
> distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, 
> keywords.universal_rank, keywords.serp_type, keywords.date_indexed, 
> keywords.search_engine_type, keywords.week from keyword_serp_results keywords 
> JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, 
> min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, 
> keywords1.search_engine_type,  keywords1.week, keywords1.rank, 
> dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN 
> (select domain, keyword, search_engine_type, week, max(date_indexed) as 
> max_date_indexed from keyword_serp_results group by 
> domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = 
> dupkeywords1.keyword AND  keywords1.domain = dupkeywords1.domain AND 
> keywords1.search_engine_type = dupkeywords1.search_engine_type AND 
> keywords1.week = dupkeywords1.week AND keywords1.date_indexed = 
> dupkeywords1.max_date_indexed) dupkeywords2 group by 
> domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on 
> keywords.keyword = dupkeywords3.keyword AND  keywords.domain = 
> dupkeywords3.domain AND keywords.search_engine_type = 
> dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND 
> keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = 
> dupkeywords3.best_rank;
>  
> This query use to work fine until I updated to r991183 on trunk and started 
> getting this error:
> java.io.IOException: cannot find dir = 
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/000000_0
>  in 
> partToPartitionInfo: 
> [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831]
> at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:100)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> This query works if I don't change the hive.input.format.
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> I've narrowed down this issue to the commit for HIVE-1510.  If I take out the 
> changeset from r987746, everything works as before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to