Hi Eva,
Can you open a new jira for this? And let’s discuss and resolve this
issue.
I guess this is because the partition metadata is added before the data is
available.
Thanks
Yongqiang
On 09-9-9 下午1:18, "Eva Tse" <[email protected]> wrote:
>
> We are planning to start enabling ad-hoc querying on our hive warehouse and we
> tested some of the concurrent queries and found the following issue:
>
> Query 1 – doing ‘insert overwrite table yyy .... partition (dateint = xxx)
> select ... from yyy where dateint = xxx’ This is done to merge small files
> within a partition in table yyy
> Query 2 – doing some select on the same table joining another table.
>
> What we found is that query 2 would fail with the following exceptions in
> multiple reducers.
> java.io.FileNotFoundException: File does not exist:
> hdfs://ip-10-251-98-80.ec2.internal:9000/user/hive/dataeng/warehouse/nccp_sess
> ion_facts/dateint=20090908/hour=9/sessionsFacts_P20090909T021823L20090908T09-r
> -00006
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSyst
> em.java:457)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:671)
> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.ja
> va:43)
> at
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileI
> nputFormat.java:63)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.j
> ava:236)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Is this expected? If so, is there a jira or is it planned to be addressed? We
> are trying to think of workaround, but haven’t thought of good ones as
> swapping of files would ideally be handled inside hive.
>
> Please let us know your feedback.
>
> Thanks,
> Eva.