leventov commented on issue #7571: Optimize coordinator API to retrieve 
segments with overshadowed status
URL: 
https://github.com/apache/incubator-druid/issues/7571#issuecomment-488330261
 
 
   @surekhasaharan 
   
   > So are we saying make DataSegment mutable and add a setter for 
overshadowed field ?
   
   Yes
   
   > What is the purpose of this interface, who would implement this ? Are you 
thinking SQLMetadataSegmentManager implements this ?
   
   The purpose is type-level and concurrency safety, however, since I agree 
with @gianm that it's easier to just compute overshadowed status eagerly on 
each `poll()`, this part of the suggestion is irrelevant. 
`SQLMetadataSegmentManager`'s interface will be intact.
   
   > I think this will not be required, if coordinator API already returns 
DataSegment which has overshadowed status.
   
   The idea is to keep using a segment object which resides the Broker's memory 
for longer, the same idea as here: 
https://github.com/apache/incubator-druid/blob/15d19f3059a990f022b1cb98bdd4e6a82cdf9a06/server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java#L750-L756
   
   @gianm 
   
   > I think we might as well do it eagerly
   
   Yes. I now also think that instead of rebuilding `VersionedIntervalTimeline` 
every time from scratch, one can be kept in memory of 
`SQLMetadataSegmentManager`. (It would also allow to skip computations if no 
segments were added or removed in a particular data source during some 
`poll()`, compared to the previous.) The second part is optional.
   
   > I am skeptical of making DataSegment mutable, though. The alternative is a 
wrapper object: something like DataSegmentWithTimelineContext or 
DataSegmentWithOvershadowedState. The main motivation I could see for adding 
fields to DataSegment, vs. doing a wrapper object, is memory savings. (Is there 
some other reason I am missing?)
   
   > It is nice, I think, that DataSegment objects are self-contained -- they 
don't depend on other context. 'isOvershadowed' by its nature depends on other 
context (namely, what other DataSegments exist). It won't be possible to set 
the field correctly unless all DataSegment objects for a particular datasource 
are considered. This won't be possible in all cases, which will make it unclear 
when the field is valid and when it's not, opening up new possible bugs. A 
wrapper encodes into the type system whether or not the contextual 
overshadowedness information has been computed or not.
   
   Looking at the existing code at #7425, I think wrapper approach would be 
worse.
   
   Regarding `DataSegment`'s immutability, it might need to go away anyway, as 
#6358 should be fixed (note @surekhasaharan: you can fix that problem, incl. 
across Coordinator -> Broker segment streaming, in the same PR). `DataSegment` 
is immutable and used in HashSets and HashMaps as a key, while it's identity is 
actually based only on `segmentId`, is not right. We should probably move to 
`Map<SegmentId, DataSegment>` across the board and make `DataSegment` a mutable 
object without `equals()` and `hashCode()` implementations to avoid defects 
like #6358 consistently.
   
   > As an example, the proposed Map<DataSegment, DataSegment> in 
MetadataSegmentView would need to keep two copies of each DataSegment, since 
the key and value are different objects. 
   
   The idea was that key and value is the same object in that case, but anyway, 
it should rather be `Map<SegmentId, DataSegment>`, see above.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to