xiarixiaoyao commented on pull request #4405:
URL: https://github.com/apache/hudi/pull/4405#issuecomment-1004680987
@nsivabalan
According to the current logic, this problem is difficult to occur, because
we determine whether the current partition needs alter by comparing whether the
paths of the partitions are the same. It is not common for Hudi tables to
modify partition paths,Although we can modify the partition path through alter
partition syntax.
It's easy to reproduce this problem in UT code,
add follow codes after line 146 in TestHiveSyncTool
_String testP =
Arrays.stream(hiveClient.scanTablePartitions(hiveSyncConfig.tableName).get(0).getValues().get(0).split("-")).collect(Collectors.joining("/"));
hiveClient.updatePartitionsToTable(hiveSyncConfig.tableName,
Arrays.asList(testP));_
BTW
When we sync alter partitions,we should better set "numFiles" and
"totalSize" for our alterd partition.
since hive.stats.autogather=true by default, hive will try to calculate
partitionStats( "numFiles" and. "totalSize") by default,
1)for add partition operation:when sync new partitions to hive,hive will
call updatePartitionStatsFast to update the Stats for every new partition。
2)for alter partition operation:hive metastore will find the old partition
which need to alter firstly;
then hive metastore will try to update the partition stats by comparing the
stats between old partition and our altered partition
however the oldPartition has stats but our altered partition has no stats(we
has not specified it), so the error occur.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]