mcvsubbu commented on issue #4914: [POC] By-passing deep-store requirement for 
Realtime segment completion
URL: https://github.com/apache/incubator-pinot/pull/4914#issuecomment-609453001
 
 
   Hi @jamesyfshao , we need to get this in without blocking you, so I suggest 
we proceed with small increments that can be easily reviewed by people. Please 
include me in all reviews.  Here are the steps I would follow. These will make 
sure that the feature is NOT enabled until the very end, and each can be tested 
independently -- either manually or via unit/integration tests added in the PR 
itself.
   
   1. PR for server to download a segment (be it realtime or offline). Download 
should happen either with http or https endpoints. Both should be possible, 
depending on the config supplied (at server level, so we can use 
indexingTableConfig for this). You can add a test in the integ test to download 
one existing realtime segment in LLRealtimeClusterInteg test. Use http or https 
downloader already available. Since this is a spec change, please include the 
spec (exact URI you plan to use, etc.) in your design doc, and send out an 
email to devs, and add in general slack channel
   
   2. Introduce an Uploader abstraction, that will upload a segment to either 
controller or deep store. Make sure this class is unit tested, tested to work 
with your deepstore, but is not used anywhere else yet (except for tests of 
course)
   
   3. Change the Split committer to use the uploader abstraction introduced, 
instead of current hard- oded controller-only. This PR maybe can go wth (2) 
depending on how but (2) gets.
   
   4. Introduce a peer downloader (but do not hook it up yet), based on either 
http or https downloader, depending on configuration. Maybe we dont even need 
this, so you can even skip this step. We can directly use http or https 
downloader
   
   This completes phase-1
   
   In phase-2, we will do:
   
   5. Intoduce a table config (inside stream config? or maybe in indexing 
config?) a boolean on whether to abort split commit if upload to deep store 
fails. This config is not used anywhere as yet. Since this is an architectural 
change, be sure to email dev mailing list, and also post in slack. Get 
consensus on where the config should go, and what exactly it means, naming, 
etc. These things can take time, and are not changeable once introduced.
    
   5. Change the controller to accept a new URL in the protocol (I suggest we 
insert the following when segment upload fails: 
"peer://peer/URI_INTRODUCED_IN_STEP_1". Or, simply the word in all caps "PEER". 
I like the previous option, because it is in URI format, and the controller can 
do one check with the URL decoder for the scheme. When the controller detects 
peer, it should not set the download URL in  the metadata (or, insert a null, 
but I prefer NO url  in metadata). Add unit tests to cover this case. Since the 
server never inserts PEER, we should be good so far.
   
   6. Change the server to include logic to fetch from peer (using http or 
https fetcher as configured) IF the fetch from deep store fails, include 
retries as discussed in phone conference. (i.e. try deep store, if. it fails, 
try peer, if that fails, try the whole sequence again, N times). Try not to 
hook up this logic as yet. Add unit tests
   
   7. Change realtime segment fixer in the controller to fix segments that are 
not there in deep store. Add integration tests.
   
   8.  Hook up logic in 6, and change the split committer to use the table 
config, add an integration test
   
   You are done at this point

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to