gianm commented on issue #7571: Optimize coordinator API to retrieve segments 
with overshadowed status
URL: 
https://github.com/apache/incubator-druid/issues/7571#issuecomment-488175532
 
 
   The general idea of computing overshadowedness as few times as possible 
sounds good to me, and in particular computing it on `poll` sounds good. I 
think we might as well do it eagerly, though, since it's always going to be 
needed (DruidCoordinatorRuleRunner, for example, always computes the list of 
overshadowed segments so it can avoid loading them on historicals).
   
   I am skeptical of making DataSegment mutable, though. The alternative is a 
wrapper object: something like DataSegmentWithTimelineContext or 
DataSegmentWithOvershadowedState. The main motivation I could see for adding 
fields to DataSegment, vs. doing a wrapper object, is memory savings. (Is there 
some other reason I am missing?)
   
   It is nice, I think, that DataSegment objects are self-contained -- they 
don't depend on other context. 'isOvershadowed' by its nature depends on other 
context (namely, what _other_ DataSegments exist). It won't be possible to set 
the field correctly unless _all_ DataSegment objects for a particular 
datasource are considered. This won't be possible in all cases, which will make 
it unclear when the field is valid and when it's not, opening up new possible 
bugs. A wrapper encodes into the type system whether or not the contextual 
overshadowedness information has been computed or not.
   
   Furthermore, I am not convinced this tactic will be effective at minimizing 
memory use. As an example, the proposed `Map<DataSegment, DataSegment>` in 
`MetadataSegmentView` would need to keep two copies of each DataSegment, since 
the key and value are different objects. A wrapper would be cheaper, since the 
DataSegment object contained by different wrappers could be shared.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to