aswinshakil commented on code in PR #5175: URL: https://github.com/apache/ozone/pull/5175#discussion_r1309500740
########## hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/SnapshotInfo.java: ########## @@ -586,15 +688,24 @@ public boolean equals(Object o) { Objects.equals( globalPreviousSnapshotId, that.globalPreviousSnapshotId) && snapshotPath.equals(that.snapshotPath) && - checkpointDir.equals(that.checkpointDir); + checkpointDir.equals(that.checkpointDir) && + deepClean == that.deepClean && Review Comment: I was wondering if we need to add `deepClean` and `sstFiltered` for `equals` as these can change depending on the moment if it was deep cleaned or sst filtered. I guess it should be fine. ########## hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/QuotaUtil.java: ########## @@ -55,8 +60,36 @@ public static long getReplicatedSize( + partialFirstChunk * rc.getParity(); return dataSize + replicationOverhead; } else { + LOG.warn("Unknown replication type '{}'. Returning original data size.", + repConfig.getReplicationType()); return dataSize; } } + /** + * Get an estimated data size (before replication) from the replicated size. + * An (inaccurate) reverse of getReplicatedSize(). + * @param replicatedSize size after replication. + * @param repConfig The replicationConfig used to store the data. + * @return Data size before replication. + */ + public static long getDataSize(long replicatedSize, + ReplicationConfig repConfig) { + if (repConfig.getReplicationType() == RATIS) { + final int ratisReplicationFactor = ((RatisReplicationConfig) repConfig) + .getReplicationFactor().getNumber(); + // May not be divisible. But it's fine to ignore remainder in our use case + return replicatedSize / ratisReplicationFactor; + } else if (repConfig.getReplicationType() == EC) { + ECReplicationConfig rc = (ECReplicationConfig) repConfig; + // In the case of EC, we won't know if keys have partial chunks or not, + // so we assume no partial chunks as an estimate. + return replicatedSize * rc.getData() / rc.getRequiredNodes(); Review Comment: For a rough estimate, This should be fine because only the last strip will be partial. ########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotCreateRequest.java: ########## @@ -160,6 +166,22 @@ public OMClientResponse validateAndUpdateCache(OzoneManager ozoneManager, .getLatestSequenceNumber(); snapshotInfo.setDbTxSequenceNumber(dbLatestSequenceNumber); + // Snapshot referenced size should be bucket's used bytes + OmBucketInfo omBucketInfo = + getBucketInfo(omMetadataManager, volumeName, bucketName); + snapshotInfo.setReferencedReplicatedSize(omBucketInfo.getUsedBytes()); + + // Snapshot referenced size in this case is an *estimate* inferred from + // the bucket default replication policy right now. + // This may well not be the actual sum of all key data sizes in this + // bucket because each key can have its own replication policy, + // depending on the choice of the client at the time of writing that key. + // And we will NOT do an O(n) walk over the keyTable (fileTable) here + // because it is a design goal of CreateSnapshot to be an O(1) operation. + // TODO: Assign actual data size once we have the pre-replicated key size Review Comment: ```suggestion // TODO: [SNAPSHOT] Assign actual data size once we have the pre-replicated key size ``` For easier grep. ########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java: ########## @@ -728,6 +728,8 @@ protected void initializeOmTables(boolean addCacheMetrics) String.class, OmDBTenantState.class); checkTableStatus(tenantStateTable, TENANT_STATE_TABLE, addCacheMetrics); + // TODO: Consider FULL_CACHE for snapshotInfoTable since exclusiveSize in Review Comment: ```suggestion // TODO: [SNAPSHOT] Consider FULL_CACHE for snapshotInfoTable since exclusiveSize in ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org