[jira] [Commented] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

Thomas Friedrich (JIRA) Thu, 30 Apr 2015 18:29:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522619#comment-14522619
 ]


Thomas Friedrich commented on HIVE-10567:
-----------------------------------------

Chaoyu, I attached a proposed patch. The problem is in method 
getInputPathsForPartialScan in class GenMapRedUtils.java. 
I added the case for DYNAMIC_PARTITION, but wasn't sure about the 
aggregrationKey. In the current patch, the aggregationKey is just the table 
name and the PartialScanMapper will join this with the task id which is 
different for each partition (one task per partition):
org.apache.hadoop.hive.ql.stats.fs.FSStatsPublisher: Writing stats in it : 
{default.testtable/000000/={numRows=2, rawDataSize=16}}
org.apache.hadoop.hive.ql.stats.fs.FSStatsPublisher: Writing stats in it : 
{default.testtable/000001/={numRows=1, rawDataSize=8}}
The output seems ok to me. 
Do you know whether the aggregationKey should be set to a different value, like 
in the STATIC_PARTITION case?

I would like to add a unit test for this case as well, that's why I didn't 
submit the patch yet.

> partial scan for rcfile table doesn't work for dynamic partition
> ----------------------------------------------------------------
>
>                 Key: HIVE-10567
>                 URL: https://issues.apache.org/jira/browse/HIVE-10567
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.14.0, 1.0.0
>            Reporter: Thomas Friedrich
>            Assignee: Chaoyu Tang
>            Priority: Minor
>              Labels: rcfile
>         Attachments: HIVE-10567.1.patch
>
>
> HIVE-3958 added support for partial scan for RCFile. This works fine for 
> static partitions (for example: analyze table analyze_srcpart_partial_scan 
> PARTITION(ds='2008-04-08',hr=11) compute statistics partialscan).
> For dynamic partition, the analyze files with an IOException 
> "java.io.IOException: No input paths specified in job":
> hive> ANALYZE TABLE testtable PARTITION(col_varchar) COMPUTE STATISTICS 
> PARTIALSCAN;
> java.io.IOException: No input paths specified in job
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
>         at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:459)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10567) partial scan for rcfile table doesn't work for dynamic partition

Reply via email to