[
https://issues.apache.org/jira/browse/IMPALA-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489122#comment-16489122
]
Gabor Kaszab edited comment on IMPALA-6119 at 5/24/18 2:42 PM:
---------------------------------------------------------------
I identified the issue to be located somewhere else. What I found is that in
case an "insert into test partition(b=1) values (2);" is invoked then only the
b=1 partition is reloaded. However the b=2 partition should also be reloaded to
be aware of the new file created by the insert.
In updatePartitionsFromHms()
{code:java}
if (loadPartitionFileMetadata) {
if (partitionsToUpdate != null) {
// Only reload file metadata of partitions specified in 'partitionsToUpdate'
Preconditions.checkState(partitionsToUpdateFileMdByPath.isEmpty());
partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate);
}
loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true);
}
{code}
getPartitionsByPath() in this case receives the b=1 partition an returns it's
path and the partition itself. As a result loadMetadataAndDiskIds() is called
only for b=1 partition.
I made some experiments to modify getPartitionsByPath() to find all the
partitions that point to the location where the ones received by parameter
points to and apparently that fixes this issue. However, it's might not the
most optimal solution.
+My general fix proposal is then the following:+
- When a particular partition is being reloaded then find all the other
partitions that has the same 'location' and reload them as well. Finding these
partitions in an optimal way is not that straightforward, though.
+2 proposals for finding the partitions with the same location:+
1) When a set of partitions are received in updatePartitionsFromHms() then go
through this set and compare the current partition from the set with all the
partitions in the table to check which one has the same 'location'. When we
have big inserts that affect a number of partitions and in the same time the
total number of partitions is high than this operation can be computational
heavy and has to be done for each insert even if we don't have partitions
pointing to the same location.
2) Similarly to HdfsTable:: partitionMap_ we can keep track of a (path -> set
of partitions) mapping. This way when a set of partitions are received in
updatePartitionsFromHms() then it's enough to go through this received list and
then we can find the partitions pointing to the same location using this
mapping in constant time. One downside of this is that everywhere partitionMap_
is changed this mapping has to be maintained as well. Still this seems the
better approach for me.
was (Author: gaborkaszab):
I identified the issue to be located somewhere else. What I found is that in
case an "insert into test partition(b=1) values (2);" is invoked then only the
b=1 partition is reloaded. However the b=2 partition should also be reloaded to
be aware of the new file created by the insert.
In updatePartitionsFromHms()
{code:java}
if (loadPartitionFileMetadata) {
if (partitionsToUpdate != null) {
// Only reload file metadata of partitions specified in 'partitionsToUpdate'
Preconditions.checkState(partitionsToUpdateFileMdByPath.isEmpty());
partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate);
}
loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true);
}
{code}
getPartitionsByPath() in this case receives the b=1 partition an returns it's
path and the partition itself. As a result loadMetadataAndDiskIds() is called
only for b=1 partition.
I made some experiments to modify getPartitionsByPath() to find all the
partitions that point to the location where the ones received by parameter
points to and apparently that fixes this issue. However, it's might not the
most optimal solution.
+My general fix proposal is then the following:+
- When a particular partition is being reloaded then find all the other
partitions that has the same 'location' and reload them as well. Finding these
partitions in an optimal way is not that straightforward, though.
+2 proposals for finding the partitions with the same location:+
1) When a set of partitions are received in updatePartitionsFromHms() then go
through this set and compare the current partition from the set with all the
partitions in the table to check which one has the same 'location'. When we
have big inserts that affect a number of partitions and in the same time the
total number of partitions is high than this operation can be computational
heavy and has to be done for each insert even if we don't have partitions
pointing to the same location.
2) Similarly to HdfsTable:: partitionMap_ we can keep track of a (path -> set
of partitions) mapping. This way when a set of partitions are received in
updatePartitionsFromHms() then it's enough to go through this received list and
then we can find the partitions pointing to the same location using this
mapping in constant time. One downside of this is that everywhere partitionMap_
is changed this mapping has to be maintained as well. Still this seems the
better approach for me.
> Inconsistent file metadata updates when multiple partitions point to the same
> path
> ----------------------------------------------------------------------------------
>
> Key: IMPALA-6119
> URL: https://issues.apache.org/jira/browse/IMPALA-6119
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
> Reporter: bharath v
> Assignee: Gabor Kaszab
> Priority: Critical
> Labels: correctness, ramp-up
>
> Following steps can give inconsistent results.
> {noformat}
> // Create a partitioned table
> create table test(a int) partitioned by (b int);
> // Create two partitions b=1/b=2 mapped to the same HDFS location.
> insert into test partition(b=1) values (1);
> alter table test add partition (b=2) location
> 'hdfs://localhost:20500/test-warehouse/test/b=1/'
> [localhost:21000] > show partitions test;
> Query: show partitions test
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | b | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |
> Incremental stats | Location |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | 1 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT |
> false | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | 2 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT |
> false | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | Total | -1 | 2 | 4B | 0B | | |
> | |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> // Insert new data into one of the partitions
> insert into test partition(b=1) values (2);
> // Newly added file is reflected only in the added partition files.
> show files in test;
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path
> | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> |
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
> | 2B | b=1 |
> |
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
> | 2B | b=1 |
> |
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
> | 2B | b=2 |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> invalidate metadata test;
> show files in test;
> // After invalidation, the newly added file now shows up in both the
> partitions.
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path
> | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> |
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
> | 2B | b=1 |
> |
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
> | 2B | b=1 |
> |
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
> | 2B | b=2 |
> |
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
> | 2B | b=2 |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> {noformat}
> So, depending whether the user invalidates the table, they can see different
> results. The bug is in the following code.
> {noformat}
> private FileMetadataLoadStats resetAndLoadFileMetadata(
> Path partDir, List<HdfsPartition> partitions) throws IOException {
> FileMetadataLoadStats loadStats = new FileMetadataLoadStats(partDir);
> ....
> ....
> ....
> for (HdfsPartition partition: partitions)
> partition.setFileDescriptors(newFileDescs); <======
> {noformat}
> We only update the added file metadata for the new partition (copy-on-write
> way). Instead we should update the source descriptors so that it is reflected
> in the other partitions too.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]