[
https://issues.apache.org/jira/browse/HIVE-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-18429:
----------------------------------
Description:
Suppose we start with empty delta_8_8 and delta_9_9 and compaction runs.
It will currently produce an MR job with 0 splits and so
{{CompactorMR.TMP_LOCATION}} never gets created. This causes
{{CompactorOutputCommitted.commitJob()}} to fail when it tries to do
{{FileStatus[] contents = fs.listStatus(tmpLocation);}} since tmpLocation
doesn't exist.
If compactor fails to produce delta_8_9 here it will fail to do further
compaction unless new delta with data is created.
If the number of empty deltas is > than
HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, compaction will not be able to
proceed at all.
It should produce a delta_8_9 in this case even if it's empty.
The error (in the log of standalone metastore process) would look like this
{noformat}
2017-12-27 17:19:28,850 ERROR CommitterEvent Processor #1
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not
commit job
java.io.FileNotFoundException: File
hdfs://OTCHaaS/apps/hive/warehouse/momi.db/sensor_data/babyid=5911806ebf69640100004257/_tmp_b4c5a3f3-44e5-4d45-86af-5b773bf0fc96
does not exist.
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923)
at
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992)
at
rg.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:785)
at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
at
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
at
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
was:
Suppose we start with empty delta_8_8 and delta_9_9 and compaction runs.
It will currently produce an MR job with 0 splits and so
{{CompactorMR.TMP_LOCATION}} never gets created. This causes
{{CompactorOutputCommitted.commitJob()}} to fail when it tries to do
{{FileStatus[] contents = fs.listStatus(tmpLocation);}} since tmpLocation
doesn't exist.
If compactor fails to produce delta_8_9 here it will fail to do further
compaction unless new delta with data is created.
If the number of empty deltas is > than
HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, compaction will not be able to
proceed at all.
It should produce a delta_8_9 in this case even if it's empty.
The error (in the log of standalone metastore process) would look like this
{noformat}
2018-01-10T13:27:10,521 WARN [Thread-209] mapred.LocalJobRunner:
job_local44610510_0003
java.io.FileNotFoundException: File
file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1515619503884/warehouse/t/_tmp_60ce7a11-d798-474f-b223-7d0acdb6dd5c
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:464)
~[hadoop-common-3.0.0-beta1.jar:?]
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1853)
~[hadoop-common-3.0.0-beta1.jar:?]
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1895)
~[hadoop-common-3.0.0-beta1.jar:?]
at
org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678)
~[hadoop-common-3.0.0-beta1.jar:?]
at
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:919)
~[classes/:?]
at
org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
~[hadoop-mapreduce-client-core-3.0.0-beta1.jar:?]
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:567)
[hadoop-mapreduce-client-common-3.0.0-beta1.jar:?]
2018-01-10T13:27:10,522 ERROR [main] compactor.Worker: Caught exception while
trying to compact
id:1,dbname:default,tableName:t,partName:null,state:^@,type:MAJOR,p\
roperties:null,runAs:null,tooManyAborts:false,highestTxnId:0. Marking failed
to avoid repeated failures, java.io.IOException: Major
at
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:346)
at
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:291)
at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:167)
{noformat}
> Compaction should handle a case when it produces no output
> ----------------------------------------------------------
>
> Key: HIVE-18429
> URL: https://issues.apache.org/jira/browse/HIVE-18429
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 1.0.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
>
> Suppose we start with empty delta_8_8 and delta_9_9 and compaction runs.
> It will currently produce an MR job with 0 splits and so
> {{CompactorMR.TMP_LOCATION}} never gets created. This causes
> {{CompactorOutputCommitted.commitJob()}} to fail when it tries to do
> {{FileStatus[] contents = fs.listStatus(tmpLocation);}} since tmpLocation
> doesn't exist.
> If compactor fails to produce delta_8_9 here it will fail to do further
> compaction unless new delta with data is created.
> If the number of empty deltas is > than
> HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, compaction will not be able to
> proceed at all.
> It should produce a delta_8_9 in this case even if it's empty.
> The error (in the log of standalone metastore process) would look like this
> {noformat}
> 2017-12-27 17:19:28,850 ERROR CommitterEvent Processor #1
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not
> commit job
> java.io.FileNotFoundException: File
> hdfs://OTCHaaS/apps/hive/warehouse/momi.db/sensor_data/babyid=5911806ebf69640100004257/_tmp_b4c5a3f3-44e5-4d45-86af-5b773bf0fc96
> does not exist.
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992)
> at
> rg.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:785)
> at
> org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)