manojpec commented on a change in pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#discussion_r785173448
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java
##########
@@ -45,36 +57,64 @@
import static
org.apache.hudi.metadata.HoodieTableMetadata.RECORDKEY_PARTITION_LIST;
/**
- * This is a payload which saves information about a single entry in the
Metadata Table.
- *
- * The type of the entry is determined by the "type" saved within the record.
The following types of entries are saved:
- *
- * 1. List of partitions: There is a single such record
- * key="__all_partitions__"
- *
- * 2. List of files in a Partition: There is one such record for each
partition
- * key=Partition name
- *
- * During compaction on the table, the deletions are merged with additions
and hence pruned.
- *
- * Metadata Table records are saved with the schema defined in
HoodieMetadata.avsc. This class encapsulates the
- * HoodieMetadataRecord for ease of operations.
+ * MetadataTable records are persisted with the schema defined in
HoodieMetadata.avsc.
+ * This class represents the payload for the MetadataTable.
+ * <p>
+ * This single metadata payload is shared by all the partitions under the
metadata table.
+ * The partition specific records are determined by the field "type" saved
within the record.
+ * The following types are supported:
+ * <p>
+ * METADATA_TYPE_PARTITION_LIST (1):
+ * -- List of all partitions. There is a single such record
+ * -- key = @{@link HoodieTableMetadata.RECORDKEY_PARTITION_LIST}
+ * <p>
+ * METADATA_TYPE_FILE_LIST (2):
+ * -- List of all files in a partition. There is one such record for each
partition
+ * -- key = partition name
+ * <p>
+ * METADATA_TYPE_COLUMN_STATS (3):
+ * -- This is an index for column stats in the table
+ * <p>
+ * METADATA_TYPE_BLOOM_FILTER (4):
+ * -- This is an index for base file bloom filters. This is a map of FileID to
its BloomFilter byte[].
+ * <p>
+ * During compaction on the table, the deletions are merged with additions and
hence records are pruned.
*/
public class HoodieMetadataPayload implements
HoodieRecordPayload<HoodieMetadataPayload> {
+ // Type of the record. This can be an enum in the schema but Avro1.8
+ // has a bug - https://issues.apache.org/jira/browse/AVRO-1810
+ protected static final int METADATA_TYPE_PARTITION_LIST = 1;
+ protected static final int METADATA_TYPE_FILE_LIST = 2;
+ protected static final int METADATA_TYPE_COLUMN_STATS = 3;
+ protected static final int METADATA_TYPE_BLOOM_FILTER = 4;
+
// HoodieMetadata schema field ids
public static final String SCHEMA_FIELD_ID_KEY = "key";
public static final String SCHEMA_FIELD_ID_TYPE = "type";
- public static final String SCHEMA_FIELD_ID_METADATA = "filesystemMetadata";
+ public static final String SCHEMA_FIELD_ID_FILESYSTEM = "filesystemMetadata";
+ private static final String SCHEMA_FIELD_ID_COLUMN_STATS =
"ColumnStatsMetadata";
+ private static final String SCHEMA_FIELD_ID_BLOOM_FILTER =
"BloomFilterMetadata";
- // Type of the record
- // This can be an enum in the schema but Avro 1.8 has a bug -
https://issues.apache.org/jira/browse/AVRO-1810
- private static final int PARTITION_LIST = 1;
- private static final int FILE_LIST = 2;
+ // HoodieMetadata bloom filter payload field ids
+ private static final String BLOOM_FILTER_FIELD_TYPE = "type";
+ private static final String BLOOM_FILTER_FIELD_TIMESTAMP = "timestamp";
+ private static final String BLOOM_FILTER_FIELD_BLOOM_FILTER = "bloomFilter";
+ private static final String BLOOM_FILTER_FIELD_IS_DELETED = "isDeleted";
+ private static final String BLOOM_FILTER_FIELD_RESERVED = "reserved";
+
+ // HoodieMetadata column stats payload field ids
+ private static final String COLUMN_STATS_FIELD_MIN_VALUE = "minValue";
+ private static final String COLUMN_STATS_FIELD_MAX_VALUE = "maxValue";
+ private static final String COLUMN_STATS_FIELD_NULL_COUNT = "nullCount";
+ private static final String COLUMN_STATS_FIELD_IS_DELETED = "isDeleted";
Review comment:
fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]