[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

GitBox Fri, 20 Dec 2019 15:35:07 -0800

chenboat commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-568125409
 
 
   > Looking at the POC so far, I think we might need to start a doc to discuss 
what's the best approach to design the workflow for using peer storage, so far 
I have a few things that I feel we should chat a bit more:
   > 
   > 1. backward compatibility and abstract of logics
   >    Right now looking at the code it seems that uploading to pinotFS or 
just using peer is controlled by a boolean value/hardcoded into pinot server 
logics. It may work for POC purpose, but maybe if we should think about how to 
structure the codes such that we can preserve the backward compatiliby of such 
logics. For example, we could implement these logics as a new pinotFS 
(local-disk/peer disk/etc) and add a new protocol (say, naming it as 
peer://..). With this change, we can move all of these extra logics to some 
separated class and plug them into pinot server if we desired to use this model 
while keeping the existing models work fine.
   
   The design doc 
[https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion](url)
 has a section discussing about back compatibility issues. In a high level, we 
will upgrade controller first with a new optional field to allow servers skip 
uploading segments. After that, we will perform the server upgrade.
   
   > 2. extending the logics of segment completion protocols
   >    We used to have deep storage to guarantee the safety of our data 
(multiple-replications of data in deep storage once commit is done). the new 
model seems to only have one copy of data once the segment commit is done. Are 
we going to work on this part and enhance our existing segment completion 
protocol so we can have more than one replica of data available before we 
decide a segment commit is finished? we should evaluate a couple of possible 
approaches and think about how to extend the segment completion 
protocols/managers to make this work
   
   The current solution allows you either have 1 copy of segment data (if there 
is no configured deep storage) or more than 1 copy when a deep storage is 
configured.
   
   > 3. coupling between pinot server components and tableDataManager
   >    In this new change, we are adding more coupling between 
tableDataManager and pinot server with the new member such as helixAdmin or 
cluster name. We probably want to abstract those members out to some other 
components such that we can avoid the explicit coupling between two components. 
For example, we can create a class to pass that information around make 
different tableDataManager take in different implementation of such wrapper 
class so we can decouple multiple components.
   
   Let me look into this abstraction issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-pinot] chenboat commented on issue #4914: [POC] By-passing deep-store requirement for Realtime segment completion

Reply via email to