kfaraz opened a new pull request, #15459:
URL: https://github.com/apache/druid/pull/15459
### Background
The segment allocation algorithm reuses an already allocated pending segment
if the new allocation request is made for the same parameters:
- datasource
- sequence name
- same interval
- same value of `skipSegmentLineageCheck` (`false` for batch append, `true`
for streaming append)
- same previous segment id (used only when `skipSegmentLineageCheck = false`)
The above parameters can thus uniquely identify a pending segment (enforced
by the UNIQUE constraint on the `sequence_name_prev_id_sha1` column in
`druid_pendingSegments` metadata table).
This reuse is done in order to
- allow replica tasks (in case of streaming ingestion) to use the same set
of segment IDs.
- allow re-run of a failed batch task to use the same segment ID and prevent
unnecessary allocations
### Breaking scenario
- Append task T1 allocates pending segment against version V1. Let's call
this pending segment `pendingV11`
- T1 fails and is unable to commit `pendingV11`
- Replace task T2 commits a new version V2 which overshadows V1.
- Append task T3 picks up from where T1 had failed. T3 thus tries to reuse
`pendingV11`.
- T3 completes and commits `pendingV11` as `segmentV11` even though it
belongs to an already overshadowed version.
- `segmentV11` gets immediately marked as unused and eventually deleted thus
causing loss of the appended data
### Changes
- Add the version of the allocated segment to the set of parameters that
uniquely define a pending segment
- Thus for a given datasource, interval, sequence, prev_segment_id, there
can be multiple pending segments, but each of them must have a different
version.
- Include the version while calculating the hash
`sequence_name_prev_id_sha1` thus preserving the UNIQUE constraint
- Add test to assert above behaviour. This test would fail without the
changes in this PR.
### Alternate approach
Clean up pending segments as soon as they are not needed. It is difficult to
ensure that a pending segment is not currently in use as multiple tasks might
be using the same segment id.
### Release note
Fix bug in segment allocation that can potentially cause loss of appended
data when running interleaved append and replace tasks.
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]