Hi Team,
While exploring HUDI source code I came across this PR:
https://github.com/apache/incubator-hudi/pull/1073
As part of the above PR, generation of delete records was added
to HoodieTestDataGenerator. Within the class HoodieTestDataGenerator, the
existingKeys Map maintains the current keys. In the above PR, the following
code was added to delete from the Map:
existingKeys.remove(kp);
This is delete by value rather than the key (private final Map<Integer,
KeyPartition> existingKeys;)
I tried fixing this issue but this leads to unit test failures
in TestHoodieDeltaStreamer within the testUpsertsCOWContinuousMode. The
code which is failing is this check (bold):
TestHelpers.waitTillCondition((r) -> {
if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
TestHelpers.assertAtleastNDeltaCommits(5, tableBasePath, dfs);
TestHelpers.assertAtleastNCompactionCommits(2, tableBasePath, dfs);
} else {
TestHelpers.assertAtleastNCompactionCommits(5, tableBasePath, dfs);
}
*TestHelpers.assertRecordCount(totalRecords + 200, tableBasePath +
"/*/*.parquet", sqlContext);*
*TestHelpers.assertDistanceCount(totalRecords + 200, tableBasePath +
"/*/*.parquet", sqlContext);*
return true;
I did not understand why a +200 was added in the checks above? Is this
related to the existingKeys.remove() which does not remove the records from
the Map?
I have left these comments on the PR itself so they are easier to read.
Thanks
Prashant