codope commented on code in PR #9311:
URL: https://github.com/apache/hudi/pull/9311#discussion_r1278796387
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -727,6 +728,11 @@ public class HoodieWriteConfig extends HoodieConfig {
*/
public static final String SPARK_SQL_MERGE_INTO_PREPPED_KEY =
"_hoodie.spark.sql.merge.into.prepped";
+ /**
+ * An internal config referring to fileID encoding. 0 refers to UUID based
encoding and 1 refers to raw string format(random string).
+ */
+ public static final String WRITES_FILEID_ENCODING =
"_hoodie.writes.fileid.encoding";
Review Comment:
how about `_hoodie.record.index.fileid.encoding`? This is going to be used
on. the read path too. Also, can we make it clear in the javadoc that it is for
record index.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -166,11 +166,15 @@ public class HoodieMetadataPayload implements
HoodieRecordPayload<HoodieMetadata
/**
* HoodieMetadata record index payload field ids
*/
- public static final String RECORD_INDEX_FIELD_PARTITION = "partition";
+ public static final String RECORD_INDEX_FIELD_PARTITION = "partitionName";
public static final String RECORD_INDEX_FIELD_FILEID_HIGH_BITS =
"fileIdHighBits";
public static final String RECORD_INDEX_FIELD_FILEID_LOW_BITS =
"fileIdLowBits";
public static final String RECORD_INDEX_FIELD_FILE_INDEX = "fileIndex";
public static final String RECORD_INDEX_FIELD_INSTANT_TIME = "instantTime";
+ public static final String RECORD_INDEX_FIELD_FILEID = "fileId";
+ public static final String RECORD_INDEX_FIELD_FILEID_ENCODING =
"fileIdEncoding";
+ public static final int RECORD_INDEX_FIELD_FILEID_ENCODING_UUID = 0;
+ public static final int RECORD_INDEX_FIELD_FILEID_ENCODING_RAW_STRING = 1;
Review Comment:
Can someone switch from 0 to 1? Let's say the record key is still uuid but i
want to store in string format after a few commits. Should we handle this in
metadata payload merge?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -166,11 +166,15 @@ public class HoodieMetadataPayload implements
HoodieRecordPayload<HoodieMetadata
/**
* HoodieMetadata record index payload field ids
*/
- public static final String RECORD_INDEX_FIELD_PARTITION = "partition";
+ public static final String RECORD_INDEX_FIELD_PARTITION = "partitionName";
public static final String RECORD_INDEX_FIELD_FILEID_HIGH_BITS =
"fileIdHighBits";
public static final String RECORD_INDEX_FIELD_FILEID_LOW_BITS =
"fileIdLowBits";
public static final String RECORD_INDEX_FIELD_FILE_INDEX = "fileIndex";
public static final String RECORD_INDEX_FIELD_INSTANT_TIME = "instantTime";
+ public static final String RECORD_INDEX_FIELD_FILEID = "fileId";
+ public static final String RECORD_INDEX_FIELD_FILEID_ENCODING =
"fileIdEncoding";
+ public static final int RECORD_INDEX_FIELD_FILEID_ENCODING_UUID = 0;
+ public static final int RECORD_INDEX_FIELD_FILEID_ENCODING_RAW_STRING = 1;
Review Comment:
Maybe we don't need as we are always picking the latest payload. But, should
we even allow such switching? Maybe add a validation?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
Review Comment:
Can we add some unit tests for paylad changes?
##########
hudi-common/src/main/avro/HoodieMetadata.avsc:
##########
Review Comment:
Aren't we considering rowId?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -727,38 +733,55 @@ private static HoodieMetadataColumnStats
mergeColumnStatsRecords(HoodieMetadataC
* @param instantTime instantTime when the record was added
*/
public static HoodieRecord<HoodieMetadataPayload>
createRecordIndexUpdate(String recordKey, String partition,
-
String fileId, String instantTime) {
+
String fileId, String instantTime, int fileIdEncoding) {
+
HoodieKey key = new HoodieKey(recordKey,
MetadataPartitionType.RECORD_INDEX.getPartitionPath());
- // Data file names have a -D suffix to denote the index (D = integer) of
the file written
- // In older HUID versions the file index was missing
- final UUID uuid;
- final int fileIndex;
+ long instantTimeMillis = -1;
try {
- if (fileId.length() == 36) {
- uuid = UUID.fromString(fileId);
- fileIndex = RECORD_INDEX_MISSING_FILEINDEX_FALLBACK;
- } else {
- final int index = fileId.lastIndexOf("-");
- uuid = UUID.fromString(fileId.substring(0, index));
- fileIndex = Integer.parseInt(fileId.substring(index + 1));
- }
+ instantTimeMillis =
HoodieActiveTimeline.parseDateFromInstantTime(instantTime).getTime();
} catch (Exception e) {
- throw new HoodieMetadataException(String.format("Invalid UUID or index:
fileID=%s, partition=%s, instantTIme=%s",
- fileId, partition, instantTime), e);
+ throw new HoodieMetadataException("Failed to create metadata payload for
record index.", e);
Review Comment:
minor: let's be more concrete about what exactly the failure is since we
know this came from instant parsing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]