[
https://issues.apache.org/jira/browse/HIVE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiaobing Zhou updated HIVE-8497:
--------------------------------
Description:
run the test
{noformat}
mvn -Phadoop-2 test -Dtest=TestCliDriver -Dqfile=alter_merge_stats_orc.q
{noformat}
to reproduce it. Simply, this query does three data loads which generates three
base orc files.
ANALYZE TABLE...COMPUTE STATISTICS NOSCAN will execute StatsNoJobTask to get
stats, where file handle is held so as not able to clean base file. As a
result, after running ALTER TABLE..CONCATENATE, follow-up queries go to stale
base file and merged file.
was:
run the test
{noformat}
mvn -Phadoop-2 test -Dtest=TestCliDriver -Dqfile=alter_merge_stats_orc.q
{noformat}
to reproduce it. Simply, this query does three data loads which generates three
base orc files.
ANALYZE TABLE...COMPUTE STATISTICS NOSCAN will execute;
ALTER TABLE CONCATENATE tries to merge orc pieces into a single one which is
final file to queried.
Output
\hive\itests\qtest\target\qfile-results\clientpositive\alter_merge_2_orc.q.out
shows # records as 600 that is wrong as opposed to 610 expected.
Because OrcFileMergeOperator only closes last orc file, the 1st and 2nd orc
files still remain in table directory due to failure of deleting unclosed file
for old data clean when MoveTask tries to copy merged orc file from scratch dir
to table dir. Eventually the query goes to old data(1st and 2nd orc files).
> StatsNoJobTask doesn't close RecordReader, FSDataInputStream of which keeps
> open to prevent stale data clean
> ------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-8497
> URL: https://issues.apache.org/jira/browse/HIVE-8497
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.14.0
> Environment: Windows
> Reporter: Xiaobing Zhou
> Assignee: Xiaobing Zhou
> Priority: Critical
>
> run the test
> {noformat}
> mvn -Phadoop-2 test -Dtest=TestCliDriver -Dqfile=alter_merge_stats_orc.q
> {noformat}
> to reproduce it. Simply, this query does three data loads which generates
> three base orc files.
> ANALYZE TABLE...COMPUTE STATISTICS NOSCAN will execute StatsNoJobTask to get
> stats, where file handle is held so as not able to clean base file. As a
> result, after running ALTER TABLE..CONCATENATE, follow-up queries go to stale
> base file and merged file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)