[ 
https://issues.apache.org/jira/browse/IMPALA-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489122#comment-16489122
 ] 

Gabor Kaszab edited comment on IMPALA-6119 at 5/24/18 2:42 PM:
---------------------------------------------------------------

I identified the issue to be located somewhere else. What I found is that in 
case an "insert into test partition(b=1) values (2);" is invoked then only the 
b=1 partition is reloaded. However the b=2 partition should also be reloaded to 
be aware of the new file created by the insert.

In updatePartitionsFromHms()
{code:java}
if (loadPartitionFileMetadata) {
  if (partitionsToUpdate != null) {
    // Only reload file metadata of partitions specified in 'partitionsToUpdate'
    Preconditions.checkState(partitionsToUpdateFileMdByPath.isEmpty());
    partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate);
  }
  loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true);
}
{code}
getPartitionsByPath() in this case receives the b=1 partition an returns it's 
path and the partition itself. As a result loadMetadataAndDiskIds() is called 
only for b=1 partition.

I made some experiments to modify getPartitionsByPath() to find all the 
partitions that point to the location where the ones received by parameter 
points to and apparently that fixes this issue. However, it's might not the 
most optimal solution.

+My general fix proposal is then the following:+
 - When a particular partition is being reloaded then find all the other 
partitions that has the same 'location' and reload them as well. Finding these 
partitions in an optimal way is not that straightforward, though.

+2 proposals for finding the partitions with the same location:+

1)  When a set of partitions are received in updatePartitionsFromHms() then go 
through this set and compare the current partition from the set with all the 
partitions in the table to check which one has the same 'location'. When we 
have big inserts that affect a number of partitions and in the same time the 
total number of partitions is high than this operation can be computational 
heavy and has to be done for each insert even if we don't have partitions 
pointing to the same location.

2) Similarly to HdfsTable:: partitionMap_ we can keep track of a (path -> set 
of partitions) mapping. This way when a set of partitions are received in 
updatePartitionsFromHms() then it's enough to go through this received list and 
then we can find the partitions pointing to the same location using this 
mapping in constant time. One downside of this is that everywhere partitionMap_ 
is changed this mapping has to be maintained as well. Still this seems the 
better approach for me.


was (Author: gaborkaszab):
I identified the issue to be located somewhere else. What I found is that in 
case an "insert into test partition(b=1) values (2);" is invoked then only the 
b=1 partition is reloaded. However the b=2 partition should also be reloaded to 
be aware of the new file created by the insert.

In updatePartitionsFromHms()
{code:java}
if (loadPartitionFileMetadata) {
  if (partitionsToUpdate != null) {
    // Only reload file metadata of partitions specified in 'partitionsToUpdate'
    Preconditions.checkState(partitionsToUpdateFileMdByPath.isEmpty());
    partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate);
  }
  loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true);
}
{code}
getPartitionsByPath() in this case receives the b=1 partition an returns it's 
path and the partition itself. As a result loadMetadataAndDiskIds() is called 
only for b=1 partition.

I made some experiments to modify getPartitionsByPath() to find all the 
partitions that point to the location where the ones received by parameter 
points to and apparently that fixes this issue. However, it's might not the 
most optimal solution.

+My general fix proposal is then the following:+
 - When a particular partition is being reloaded then find all the other 
partitions that has the same 'location' and reload them as well. Finding these 
partitions in an optimal way is not that straightforward, though.

+2 proposals for finding the partitions with the same location:+

1)  When a set of partitions are received in updatePartitionsFromHms() then go 
through this set and compare the current partition from the set with all the 
partitions in the table to check which one has the same 'location'. When we 
have big inserts that affect a number of partitions and in the same time the 
total number of partitions is high than this operation can be computational 
heavy and has to be done for each insert even if we don't have partitions 
pointing to the same location.

2) Similarly to HdfsTable:: partitionMap_ we can keep track of a (path -> set 
of partitions) mapping. This way when a set of partitions are received in 
updatePartitionsFromHms() then it's enough to go through this received list and 
then we can find the partitions pointing to the same location using this 
mapping in constant time. One downside of this is that everywhere partitionMap_ 
is changed this mapping has to be maintained as well. Still this seems the 
better approach for me.

 

> Inconsistent file metadata updates when multiple partitions point to the same 
> path
> ----------------------------------------------------------------------------------
>
>                 Key: IMPALA-6119
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6119
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: bharath v
>            Assignee: Gabor Kaszab
>            Priority: Critical
>              Labels: correctness, ramp-up
>
> Following steps can give inconsistent results.
> {noformat}
> // Create a partitioned table
> create table test(a int) partitioned by (b int);
> // Create two partitions b=1/b=2 mapped to the same HDFS location.
> insert into test partition(b=1) values (1);
> alter table test add partition (b=2) location 
> 'hdfs://localhost:20500/test-warehouse/test/b=1/' 
> [localhost:21000] > show partitions test;
> Query: show partitions test
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | b     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
> Incremental stats | Location                                       |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | 1     | -1    | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | 
> false             | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | 2     | -1    | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | 
> false             | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | Total | -1    | 2      | 4B   | 0B           |                   |        | 
>                   |                                                |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> // Insert new data into one of the partitions
> insert into test partition(b=1) values (2);
> // Newly added file is reflected only in the added partition files. 
> show files in test;
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path                                                                        
>                        | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=2       |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> invalidate metadata test;
>  show files in test;
> // After invalidation, the newly added file now shows up in both the 
> partitions.
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path                                                                        
>                        | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=1       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0.
>  | 2B   | b=2       |
> | 
> hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0.
>  | 2B   | b=2       |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> {noformat}
> So, depending whether the user invalidates the table, they can see different 
> results. The bug is in the following code.
> {noformat}
> private FileMetadataLoadStats resetAndLoadFileMetadata(
>       Path partDir, List<HdfsPartition> partitions) throws IOException {
>     FileMetadataLoadStats loadStats = new FileMetadataLoadStats(partDir);
> ....
> ....
> ....
>  for (HdfsPartition partition: partitions) 
> partition.setFileDescriptors(newFileDescs);  <======
> {noformat}
> We only update the added file metadata for the new partition (copy-on-write 
> way). Instead we should update the source descriptors so that it is reflected 
> in the other partitions too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to