the-other-tim-brown commented on code in PR #9367:
URL: https://github.com/apache/hudi/pull/9367#discussion_r1284890274
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -324,33 +325,45 @@ public static HoodieRecord<HoodieMetadataPayload>
createPartitionListRecord(List
* @param partition The name of the partition
* @param filesAdded Mapping of files to their sizes for files which have
been added to this partition
* @param filesDeleted List of files which have been deleted from this
partition
+ * @param instantTime Commit time of the commit responsible for adding
and/or deleting these files, will be empty during bootstrapping of the metadata
table
*/
public static HoodieRecord<HoodieMetadataPayload>
createPartitionFilesRecord(String partition,
-
Option<Map<String, Long>> filesAdded,
-
Option<List<String>> filesDeleted) {
- Map<String, HoodieMetadataFileInfo> fileInfo = new HashMap<>();
- filesAdded.ifPresent(filesMap ->
- fileInfo.putAll(
- filesMap.entrySet().stream().collect(
- Collectors.toMap(Map.Entry::getKey, (entry) -> {
- long fileSize = entry.getValue();
- // Assert that the file-size of the file being added is
positive, since Hudi
- // should not be creating empty files
- checkState(fileSize > 0);
- return new HoodieMetadataFileInfo(fileSize, false);
- })))
- );
- filesDeleted.ifPresent(filesList ->
- fileInfo.putAll(
- filesList.stream().collect(
- Collectors.toMap(Function.identity(), (ignored) -> new
HoodieMetadataFileInfo(0L, true))))
- );
+
Map<String, Long> filesAdded,
+
List<String> filesDeleted,
+
Option<String> instantTime) {
+ int size = filesAdded.size() + filesDeleted.size();
+ Map<String, HoodieMetadataFileInfo> fileInfo = new HashMap<>(size, 1);
+ filesAdded.forEach((fileName, fileSize) -> {
+ // Assert that the file-size of the file being added is positive, since
Hudi
+ // should not be creating empty files
+ checkState(fileSize > 0);
+ fileInfo.put(handleFileName(fileName, instantTime), new
HoodieMetadataFileInfo(fileSize, false));
+ });
+
+ filesDeleted.forEach(fileName -> fileInfo.put(handleFileName(fileName,
instantTime), DELETE_FILE_METADATA));
HoodieKey key = new HoodieKey(partition,
MetadataPartitionType.FILES.getPartitionPath());
HoodieMetadataPayload payload = new
HoodieMetadataPayload(key.getRecordKey(), METADATA_TYPE_FILE_LIST, fileInfo);
return new HoodieAvroRecord<>(key, payload);
}
+ /**
+ * In the case where a file was created by something other than a Hudi
writer, the file name will not contain the commit time. We will prefix the file
name with hudiext_[commitTime] before storing
+ * in the metadata table. The constructor for {@link
org.apache.hudi.common.model.HoodieBaseFile} will properly handle this prefix.
+ * @param fileName incoming file name
+ * @param commitTime time of the commit (will be empty during bootstrap
operations)
+ * @return file name with commit time prefix if the input file name does not
contain the commit time, otherwise returns the original input
+ */
+ private static String handleFileName(String fileName, Option<String>
commitTime) {
Review Comment:
If we want to avoid this call, we can try to send the prefixed filename
through the WriteStatus instead so the fileName is a pass through at this point
in the code. @vinothchandar and @nsivabalan, what are your thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]