viktorsomogyi commented on PR #13421: URL: https://github.com/apache/kafka/pull/13421#issuecomment-1582838059
So I have some context with the replica fetcher area (mostly by reading and debugging), I hope I can help. First, since the conversation is a bit long, let me summarize what I understand: - The problem is disk A reaches its capacity limits - The solution is to move partition X-1 to disk B - During the reassignment, log cleaning is disabled on X-1 (which can therefore fill disk A) - The reassignment of X-1 fails, it is left failed there on B and X-1 on A keeps growing Is this correct? If it is, we may need to separate the deletion and compaction cases. I think resuming deletion is safe, however resuming compaction might not be, since compaction alters the log. If an operator somehow resumes B and lets replication continue, then the history of X-1 in A and B might be different (I'm still working on a local test case that reproduces this). What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org