mounikanakkala opened a new issue, #12458:
URL: https://github.com/apache/druid/issues/12458
Druid does not intermittently drop segments past retention time. This led to
org.apache.druid.segment.SegmentMissingException on our systems
### Affected Version
0.22.1
### Description
**What happened**
We have a datasource where we set Retention rules -
loadByPeriod(P24M+future), dropForever. Datasource segment granularity is Hour.
We encountered an erroneous case where a segment that is past 24 months did
not get deleted properly.
- Segments page shows the segment as available and after some refreshes, it
doesn't show. But after some more refreshes it reappears.
- We ran a query on sys.server_segments table
```
select *
from sys.server_segments
where segment_id = <segment_id>
```
It returned two historicals having that segment. Since we have Druid cluster
setup on Kubernetes, we deleted the two historical pods and that's when the
segments were no longer available on Druid and the issue was resolved.
**How often is this issue occurring**
It doesn't happen with all segments but happens for 1-2 segments once in a
few days.
**More details on Druid cluster setup**
- Druid processes - Coordinator, middle managers, historicals, broker,
router are on Kubernetes.
- Historicals use AWS EBS as Persistence volume. This means data is actually
stored on EBS and when Historical pod is removed, another pod is created within
minutes and the EBS gets attached to this new pod.
- When we deleted the pod as mentioned above, the issue got resolved. Since
EBS is not affected, I suppose it means that there was some main-memory
information that was still there on Historical but it was not supposed to.
**How did we come across this issue**
Time was 2022-04-19T03. Segment that did not get deleted was 2020-02-19T00
even though it was past 24 months.
We ran time boundary query
```
{
"dataSource": "our_datasource",
"queryType": "timeBoundary",
"bound": "minTime"
}
```
We got the following exception
```
org.apache.druid.server.QueryResource - Exception handling request:
{class=org.apache.druid.server.QueryResource, exceptionType=class
org.apache.druid.segment.SegmentMissingException,
exceptionMessage=No results found for
segments[[SegmentDescriptor{interval=2020-04-19T00:00:00.000Z/2020-04-19T01:00:00.000Z,
version='2022-04-11T17:18:50.095Z', partitionNumber=0}]],
query={
"queryType": "timeBoundary",
"dataSource": {
"type": "table",
"name": "our_datasource"
},
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"bound": "minTime",
"filter": null,
"descending": false,
"granularity": {
"type": "all"
}
}, peer=xx.xx.xx.xx}
(org.apache.druid.segment.SegmentMissingException: No results found for
segments[[SegmentDescriptor{interval=2020-04-19T00:00:00.000Z/2020-04-19T01:00:00.000Z,
version='2022-04-11T17:18:50.095Z', partitionNumber=0}]])
```
As the segment is past retention time, that's when we started checking on
Segments page as mentioned above and the sys.server_segments table.
Kindly help us resolve this issue. Please let us know if you need further
details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]