JingsongLi commented on code in PR #316:
URL: https://github.com/apache/flink-table-store/pull/316#discussion_r995324401
##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/MergeTreeWriter.java:
##########
@@ -138,46 +133,31 @@ public void flushMemory() throws Exception {
trySyncLatestCompaction(true);
}
- // write changelog file
- List<String> extraFiles = new ArrayList<>();
- if (changelogProducer == ChangelogProducer.INPUT) {
- SingleFileWriter<KeyValue, Void> writer =
writerFactory.createChangelogFileWriter();
- writer.write(memTable.rawIterator());
- writer.close();
- extraFiles.add(writer.path().getName());
- }
-
// write lsm level 0 file
- try {
- Iterator<KeyValue> iterator =
memTable.mergeIterator(keyComparator, mergeFunction);
- KeyValueDataFileWriter writer =
writerFactory.createLevel0Writer();
- writer.write(iterator);
- writer.close();
-
- // In theory, this fileMeta should contain statistics from
both lsm file extra file.
- // However for level 0 files, as we do not drop DELETE
records, keys appear in one
- // file will also appear in the other. So we just need to use
statistics from one of
- // them.
- //
- // For value count merge function, it is possible that we have
changelog first
- // adding one record then remove one record, but after merging
this record will not
- // appear in lsm file. This is OK because we can also skip
this changelog.
- DataFileMeta fileMeta = writer.result();
- if (fileMeta == null) {
- for (String extraFile : extraFiles) {
- writerFactory.deleteFile(extraFile);
+ Iterator<KeyValue> iterator =
memTable.mergeIterator(keyComparator, mergeFunction);
Review Comment:
Writing data first will sort the data in the writer buffer, which will make
the changelog different from the input order.
But it may not be a bad thing, because in
https://github.com/apache/flink-table-store/pull/315 It is impossible to
maintain the input order.
It is better to note the following sequence here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]