[
https://issues.apache.org/jira/browse/HIVE-29601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marta Kuczora updated HIVE-29601:
---------------------------------
Description:
When the cleaner selects the base directories, all bases are validated by the
AcidUtils.isValidBase method.
{code:java}
private static boolean isValidBase(ParsedBaseLight parsedBase, ValidWriteIdList
writeIdList, FileSystem fs,
HdfsDirSnapshot dirSnapshot) throws IOException {
boolean isValidBase;
if (dirSnapshot != null && dirSnapshot.isValidBase() != null) {
isValidBase = dirSnapshot.isValidBase();
} else {
if (parsedBase.getWriteId() == Long.MIN_VALUE) {
//such base is created by 1st compaction in case of non-acid to acid
table conversion
//By definition there are no open txns with id < 1.
isValidBase = true;
} else if (writeIdList.getMinOpenWriteId() != null &&
parsedBase.getWriteId() <= writeIdList
.getMinOpenWriteId()) {
isValidBase = true;
} else if (isCompactedBase(parsedBase, fs, dirSnapshot)) {
isValidBase = writeIdList.isValidBase(parsedBase.getWriteId());
} else {
// if here, it's a result of IOW
isValidBase = writeIdList.isWriteIdValid(parsedBase.getWriteId());
}
if (dirSnapshot != null) {
dirSnapshot.setIsValidBase(isValidBase);
}
}
return isValidBase;
} {code}
The following condition doesn't consider the cleaner's highWaterMark
{code:java}
else if (writeIdList.getMinOpenWriteId() != null && parsedBase.getWriteId() <=
writeIdList.getMinOpenWriteId()) { isValidBase = true;
} {code}
So if the minOpenWriteId is set and greater than the highWaterMark, base
directories with writeId above the highWaterMark are considered as valid. This
can lead into use cases when the cleaner deletes bases above the cleaner's
highWaterMark.
This issue can lead to dataloss as well. If the base directory with the highest
writeId is the result of an aborted insert-overwrite, and there is an open
write transaction with higher writeId, the writeId <= minOpenWriteId will be
true, so that base will be selected as the best valid delta. At this point in
the code, the non-compacted base directories are checked if they are aborted or
not. It will happen later in the writeIdList.isWriteIdValid call. If this base
directory is selected as best base, the cleaner can clean up delta directories,
but later this base will be cleaned up as well. So we lost data.
Example:
* insert 3 deltas and run major compaction (id=1)
* insert 3 deltas and run major compaction (id=2)
* an insert is started but it will stay open when the first cleaner is running
* at this point we have the following directories:
** base_3_v0000004
*
** base_6_v0000008
** base_9_v0000012
** delta_0000001_0000001
** delta_0000002_0000002
** delta_0000003_0000003
** delta_0000004_0000004
** delta_0000005_0000005
** delta_0000006_0000006
** delta_0000007_0000007
** delta_0000008_0000008
** delta_0000009_0000009
** delta_0000010_0000010
* cleaner runs the first time
It will pick compaction (id=1) from the queue. The cleaner should delete only
the directories which are made obsolete by that compaction. In this example it
is only delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003.
The minOpenWriteId=10 at this point and because of the check in the isValidBase
method, base_9_v0000012 will be selected as the latest valid delta and
base_3_v0000004 and base_6_v0000008 will be deleted.
* the open insert is committed, so minOpenWriteId won't be set any more
* cleaner runs for the second compaction
it tries to find a base below its highWaterMark (it would be base_6_v0000008),
but since it is deleted, the cleaner will fail with
ACID_NOT_ENOUGH_HISTORY error.
If there is an aborted insert overwrite between the last compaction and the
open insert, its base directory would be selected as latest valid base and
delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003 and the
other base directories will be deleted. By this the data in
delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003 will be
lost.
The shortcut to check only against the minOpenWriteId is added in this Jira:
https://issues.apache.org/jira/browse/HIVE-22754
was:
When the cleaner selects the base directories, all bases are validated by the
AcidUtils.isValidBase method.
{code:java}
private static boolean isValidBase(ParsedBaseLight parsedBase, ValidWriteIdList
writeIdList, FileSystem fs,
HdfsDirSnapshot dirSnapshot) throws IOException {
boolean isValidBase;
if (dirSnapshot != null && dirSnapshot.isValidBase() != null) {
isValidBase = dirSnapshot.isValidBase();
} else {
if (parsedBase.getWriteId() == Long.MIN_VALUE) {
//such base is created by 1st compaction in case of non-acid to acid
table conversion
//By definition there are no open txns with id < 1.
isValidBase = true;
} else if (writeIdList.getMinOpenWriteId() != null &&
parsedBase.getWriteId() <= writeIdList
.getMinOpenWriteId()) {
isValidBase = true;
} else if (isCompactedBase(parsedBase, fs, dirSnapshot)) {
isValidBase = writeIdList.isValidBase(parsedBase.getWriteId());
} else {
// if here, it's a result of IOW
isValidBase = writeIdList.isWriteIdValid(parsedBase.getWriteId());
}
if (dirSnapshot != null) {
dirSnapshot.setIsValidBase(isValidBase);
}
}
return isValidBase;
} {code}
The following condition doesn't consider the cleaner's highWaterMark
{code:java}
else if (writeIdList.getMinOpenWriteId() != null && parsedBase.getWriteId() <=
writeIdList.getMinOpenWriteId()) { isValidBase = true;
} {code}
So if the minOpenWriteId is set and greater than the highWaterMark, base
directories with writeId above the highWaterMark are considered as valid. This
can lead into use cases when the cleaner deletes bases above the cleaner's
highWaterMark.
This issue can lead to dataloss as well. If the base directory with the highest
writeId is the result of an aborted insert-overwrite, and there is an open
write transaction with higher writeId, the writeId <= minOpenWriteId will be
true, so that base will be selected as the best valid delta. At this point in
the code, the non-compacted base directories are checked if they are aborted or
not. It will happen later in the writeIdList.isWriteIdValid call. If this base
directory is selected as best base, the cleaner can clean up delta directories,
but later this base will be cleaned up as well. So we lost data.
Example 1:
* insert 3 deltas and run major compaction (id=1)
* insert 3 deltas and run major compaction (id=2)
* an insert is started but it will stay open when the first cleaner is running
* at this point we have the following directories:
** base_3_v0000004
** base_6_v0000008
** base_9_v0000012
** delta_0000001_0000001
** delta_0000002_0000002
** delta_0000003_0000003
** delta_0000004_0000004
** delta_0000005_0000005
** delta_0000006_0000006
** delta_0000007_0000007
** delta_0000008_0000008
** delta_0000009_0000009
** delta_0000010_0000010
* cleaner runs the first time
It will pick compaction (id=1) from the queue. The cleaner should delete only
the directories which are made obsolete by that compaction. In this example it
is only delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003.
The minOpenWriteId=10 at this point and because of the check in the isValidBase
method, base_9_v0000012 will be selected as the latest valid delta and
base_3_v0000004 and base_6_v0000008 will be deleted.
* the open insert is committed, so minOpenWriteId won't be set any more
* cleaner runs for the second compaction
it tries to find a base below its highWaterMark (it would be base_6_v0000008),
but since it is deleted, the cleaner will fail with
ACID_NOT_ENOUGH_HISTORY error.
If there is an aborted insert overwrite between the last compaction and the
open insert, its base directory would be selected as latest valid base and
delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003 and the
other base directories will be deleted. By this the data in
delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003 will be
lost.
> ACID: Cleaner finds base directories valid with writeId above the cleaner
> highWaterMark
> ---------------------------------------------------------------------------------------
>
> Key: HIVE-29601
> URL: https://issues.apache.org/jira/browse/HIVE-29601
> Project: Hive
> Issue Type: Task
> Affects Versions: 4.2.0
> Reporter: Marta Kuczora
> Assignee: Marta Kuczora
> Priority: Major
> Fix For: 4.3.0
>
>
> When the cleaner selects the base directories, all bases are validated by the
> AcidUtils.isValidBase method.
> {code:java}
> private static boolean isValidBase(ParsedBaseLight parsedBase,
> ValidWriteIdList writeIdList, FileSystem fs,
> HdfsDirSnapshot dirSnapshot) throws IOException {
> boolean isValidBase;
> if (dirSnapshot != null && dirSnapshot.isValidBase() != null) {
> isValidBase = dirSnapshot.isValidBase();
> } else {
> if (parsedBase.getWriteId() == Long.MIN_VALUE) {
> //such base is created by 1st compaction in case of non-acid to acid
> table conversion
> //By definition there are no open txns with id < 1.
> isValidBase = true;
> } else if (writeIdList.getMinOpenWriteId() != null &&
> parsedBase.getWriteId() <= writeIdList
> .getMinOpenWriteId()) {
> isValidBase = true;
> } else if (isCompactedBase(parsedBase, fs, dirSnapshot)) {
> isValidBase = writeIdList.isValidBase(parsedBase.getWriteId());
> } else {
> // if here, it's a result of IOW
> isValidBase = writeIdList.isWriteIdValid(parsedBase.getWriteId());
> }
> if (dirSnapshot != null) {
> dirSnapshot.setIsValidBase(isValidBase);
> }
> }
> return isValidBase;
> } {code}
> The following condition doesn't consider the cleaner's highWaterMark
> {code:java}
> else if (writeIdList.getMinOpenWriteId() != null && parsedBase.getWriteId()
> <= writeIdList.getMinOpenWriteId()) { isValidBase = true;
> } {code}
> So if the minOpenWriteId is set and greater than the highWaterMark, base
> directories with writeId above the highWaterMark are considered as valid.
> This can lead into use cases when the cleaner deletes bases above the
> cleaner's highWaterMark.
> This issue can lead to dataloss as well. If the base directory with the
> highest writeId is the result of an aborted insert-overwrite, and there is an
> open write transaction with higher writeId, the writeId <= minOpenWriteId
> will be true, so that base will be selected as the best valid delta. At this
> point in the code, the non-compacted base directories are checked if they are
> aborted or not. It will happen later in the writeIdList.isWriteIdValid call.
> If this base directory is selected as best base, the cleaner can clean up
> delta directories, but later this base will be cleaned up as well. So we lost
> data.
> Example:
> * insert 3 deltas and run major compaction (id=1)
> * insert 3 deltas and run major compaction (id=2)
> * an insert is started but it will stay open when the first cleaner is
> running
> * at this point we have the following directories:
> ** base_3_v0000004
> *
> ** base_6_v0000008
> ** base_9_v0000012
> ** delta_0000001_0000001
> ** delta_0000002_0000002
> ** delta_0000003_0000003
> ** delta_0000004_0000004
> ** delta_0000005_0000005
> ** delta_0000006_0000006
> ** delta_0000007_0000007
> ** delta_0000008_0000008
> ** delta_0000009_0000009
> ** delta_0000010_0000010
> * cleaner runs the first time
> It will pick compaction (id=1) from the queue. The cleaner should delete only
> the directories which are made obsolete by that compaction. In this example
> it is only delta_0000001_0000001, delta_0000002_0000002 and
> delta_0000003_0000003.
> The minOpenWriteId=10 at this point and because of the check in the
> isValidBase method, base_9_v0000012 will be selected as the latest valid
> delta and base_3_v0000004 and base_6_v0000008 will be deleted.
> * the open insert is committed, so minOpenWriteId won't be set any more
> * cleaner runs for the second compaction
> it tries to find a base below its highWaterMark (it would be
> base_6_v0000008), but since it is deleted, the cleaner will fail with
> ACID_NOT_ENOUGH_HISTORY error.
> If there is an aborted insert overwrite between the last compaction and the
> open insert, its base directory would be selected as latest valid base and
> delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003 and
> the other base directories will be deleted. By this the data in
> delta_0000001_0000001, delta_0000002_0000002 and delta_0000003_0000003 will
> be lost.
> The shortcut to check only against the minOpenWriteId is added in this Jira:
> https://issues.apache.org/jira/browse/HIVE-22754
--
This message was sent by Atlassian Jira
(v8.20.10#820010)