leventov commented on issue #7571: Optimize coordinator API to retrieve segments with overshadowed status URL: https://github.com/apache/incubator-druid/issues/7571#issuecomment-488330261 @surekhasaharan > So are we saying make DataSegment mutable and add a setter for overshadowed field ? Yes > What is the purpose of this interface, who would implement this ? Are you thinking SQLMetadataSegmentManager implements this ? The purpose is type-level and concurrency safety, however, since I agree with @gianm that it's easier to just compute overshadowed status eagerly on each `poll()`, this part of the suggestion is irrelevant. `SQLMetadataSegmentManager`'s interface will be intact. > I think this will not be required, if coordinator API already returns DataSegment which has overshadowed status. The idea is to keep using a segment object which resides the Broker's memory for longer, the same idea as here: https://github.com/apache/incubator-druid/blob/15d19f3059a990f022b1cb98bdd4e6a82cdf9a06/server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java#L750-L756 @gianm > I think we might as well do it eagerly Yes. I now also think that instead of rebuilding `VersionedIntervalTimeline` every time from scratch, one can be kept in memory of `SQLMetadataSegmentManager`. (It would also allow to skip computations if no segments were added or removed in a particular data source during some `poll()`, compared to the previous.) The second part is optional. > I am skeptical of making DataSegment mutable, though. The alternative is a wrapper object: something like DataSegmentWithTimelineContext or DataSegmentWithOvershadowedState. The main motivation I could see for adding fields to DataSegment, vs. doing a wrapper object, is memory savings. (Is there some other reason I am missing?) > It is nice, I think, that DataSegment objects are self-contained -- they don't depend on other context. 'isOvershadowed' by its nature depends on other context (namely, what other DataSegments exist). It won't be possible to set the field correctly unless all DataSegment objects for a particular datasource are considered. This won't be possible in all cases, which will make it unclear when the field is valid and when it's not, opening up new possible bugs. A wrapper encodes into the type system whether or not the contextual overshadowedness information has been computed or not. Looking at the existing code at #7425, I think wrapper approach would be worse. Regarding `DataSegment`'s immutability, it might need to go away anyway, as #6358 should be fixed (note @surekhasaharan: you can fix that problem, incl. across Coordinator -> Broker segment streaming, in the same PR). `DataSegment` is immutable and used in HashSets and HashMaps as a key, while it's identity is actually based only on `segmentId`, is not right. We should probably move to `Map<SegmentId, DataSegment>` across the board and make `DataSegment` a mutable object without `equals()` and `hashCode()` implementations to avoid defects like #6358 consistently. > As an example, the proposed Map<DataSegment, DataSegment> in MetadataSegmentView would need to keep two copies of each DataSegment, since the key and value are different objects. The idea was that key and value is the same object in that case, but anyway, it should rather be `Map<SegmentId, DataSegment>`, see above.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
