jihoonson commented on pull request #11507:
URL: https://github.com/apache/druid/pull/11507#issuecomment-891402587


   > > Seems like only subTaskId and loadSpec are in use in this class which 
confuses me why this class needs other fields. I suggest to add a new interface 
for PartitionLocation and let DeepStoragePartitionLocation implement it instead 
of extending GenericPartitionLocation because GenericPartitionLocation is 
designed for local storage for shuffle. Since we already have an abstract class 
of the same name of PartitionLocation, you will have to rename it or use 
another name for the new interface.
   > 
   > @jihoonson I moved PartitionLocation to be an interface but realized there 
are other abstract classes like `PartialSegmentMergeTask`, 
`PartialSegmentMergeIOConfig`, `PartialSegmentMergeIngestionSpec` similar to 
`PartitionLocation` which will ideally also need to be made interfaces and each 
having two implementations based on `PartitionLocation` class. However I don't 
see what will be gained from interfacing these as the logic for these classes 
will be same for any PartitionLocation as of now, seems like generalizing too 
much. I think we can keep `DeepStoragePartitionLocation` an extension of 
`GenericPartitionLocation` to avoid this problem but some of the fields would 
be redundant. What do you think ?
   
   @pjain1 I wanted to understand what the difficulty is, so took a stab and 
implemented what I suggested in my previous comment. The code is available in 
my 
[branch](https://github.com/jihoonson/druid/tree/partition-location-interface). 
I haven't tested my change, but at least it compiles successfully. 
   
   `PartialSegmentMergeTask`, `PartialSegmentMergeIOConfig`, and 
`PartialSegmentMergeIngestionSpec` still remain as abstract classes today 
because of a historical reason (I created them as abstract but haven't cleaned 
them up). So I propomted `PartialSegmentMergeIOConfig` to be non-abstract which 
now accepts a list of `PartitionLocation`s instead of a list of extensions of 
`PartitionLocation`. `ShuffleClient` now needs to know what type of 
`PartitionLocation` to use. `PartialSegmentMergeIngestionSpec` is also no 
longer an abstract class. This way, you don't have to add new 
`PartialSegmentMergeTask`, `PartialSegmentMergeIOConfig`, and 
`PartialSegmentMergeIngestionSpec` implementations corresponding to 
`DeepStoragePartitionLocation`. I haven't touched `PartitionStat` in my branch, 
but think a similar technique can apply. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to