JingsongLi commented on code in PR #6274:
URL: https://github.com/apache/paimon/pull/6274#discussion_r2367315289
##########
paimon-python/pypaimon/read/table_scan.py:
##########
@@ -185,38 +248,59 @@ def _filter_by_stats(self, file_entry: ManifestEntry) ->
bool:
})
def _create_append_only_splits(self, file_entries: List[ManifestEntry]) ->
List['Split']:
- if not file_entries:
- return []
+ if self.idx_of_this_subtask is not None:
+ # Sort by file creation time to ensure consistent sharding
+ file_entries.sort(key=lambda x: x.file.creation_time)
- data_files: List[DataFileMeta] = [e.file for e in file_entries]
+ partitioned_split = defaultdict(list)
Review Comment:
Do shard before splitting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]