Thanks Subbu. Let me summarize the discussion so far on this thread and
some questions/todos raised earlier:

1) Metadata format change for download url in *LLCRealtimeSegmentZKMetadata
*on top of today's deepstore uris.
    *Options*:* Empty string *or* a new scheme peer:///segmentName*
    *LLCRealtimeSegmentZKMetadata.getDownloadUrl()* is only used by
*RealtimeTableDataManager.downloadAndReplaceSegment().
*So all the url checking can be centralized here. The main pros of using
empty string is that we only have two main classes of url formats: valid
deepstore uri or empty string. This simplifies download logic at the
expense of debuggability (because we do not know if it is an error in
setting a download url or someone intentionally sets it empty.). The
pro/cons of using *peer:///segmentName* is exactly the opposite because we
are introducing 3 possible classes of url values: empty, peer and deepstore.
     *Decision*: Use *empty string* so that we have 2 classes of uri
formats and a unified and clear real time segment download strategy: i.e.,
download based on download url and if it fails use peer as back up.

   - Note that OfflineSegment also has download url in its metadata and its
   own download method. We should make it a TODO to unify them somehow in a
   follow-up design instead of the current discussion.

2) Peer segment download method.
    A few issues raised in the thread but I feel the best place to discuss
this issue is in this filed PR
<https://github.com/apache/incubator-pinot/pull/5336> and the design doc.
Let me summarize the issue and put my opinions on them:

   - Use *SegmentFetcher* or *PinotFS:  *The two interface/abstract classes
   are very similar -- do we have a plan to merge them btw? To me the main
   functional requirement is the ability to load balance download across
   different servers. Implementation wise, we have a few options: (1) My PR
   right now uses a util class to pick a random server and let http(s) segment
   fetcher to download it; (2) Subbu proposed to let SegmentFetcher pick the
   random url by adding a new interface method. (3) SegmentFetcher has a
   subclass *PinotFSSegmentFetcher* through which we can abstract all
   download logic there via a new PeerPinotFS and more... My preference is
   toward a straightforward one like (1) or (2) for now. A full PinotFS on
   peer servers makes more sense when the offline segment support is also
   ready.



On Mon, Jun 15, 2020 at 8:48 AM Subbu Subramaniam <[email protected]>
wrote:

> No, I mean the follwing:
> (1) Segment download URI in metadata will have either a valid URI (from
> deepstore) OR be empty.
> (2) In case it is empty, we construct the URIs of all selected peers and
> pass it to the segment fetcher
> (3) We add a new method to the SegmentFetcher interface that takes a list
> of URIs instead of a single URI
> (4) We modify the retry logic in the base class to pick a random one from
> the list (even if the list size is 1).
> (5) default implementation for the list could be to take a random URI (or
> the first one, or whatever) from the list and call the existing method of
> one URI
>
> =Subbu
>
> On 2020/06/11 22:09:05, TING CHEN <[email protected]> wrote:
> > You mean multiple URIs in a segment's download url. No for this project.
> >
> > On Thu, Jun 11, 2020 at 2:59 PM kishore g <[email protected]> wrote:
> >
> > > +1 peer.
> > >
> > > unrelated to this - do we support multiple URI's?
> > >
> > > On Thu, Jun 11, 2020 at 2:51 PM Subbu Subramaniam <[email protected]
> >
> > > wrote:
> > >
> > > > Hey Ting,
> > > >
> > > > I like the URI in metadata as "peer:///segmentName". This way, the
> URI
> > > > remains parsable and we can use the scheme to check for a segment
> > > fetcher,
> > > >
> > > > thanks
> > > >
> > > > -Subbu
> > > >
> > > > On 2020/06/10 01:09:25, TING CHEN <[email protected]> wrote:
> > > > > As part of the deep store by-passing
> > > > > <
> > > >
> > >
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> > > > >
> > > > > project, a server is allowed to download segments from peer Pinot
> > > servers
> > > > > during Low Level Consumer (LLC) instead of deep segment stores. To
> > > enable
> > > > > this feature, we plan to add a new URI format for the field
> > > > > *segment.realtime.download.url
> > > > > *in LLCRealtimeSegmentZKMetadata.
> > > > >
> > > > > The new URI format serves the purpose of instructing a Pinot
> server to
> > > > find
> > > > > and download the segment from a peer server. Controller writes it
> to
> > > > Helix
> > > > > in case of segment upload failure or no deep store configured at
> all.
> > > We
> > > > > proposed the following format options and want to hear your
> feedback:
> > > > >
> > > > >    1. peer:///segmentName; (my preference)
> > > > >    2. simply an empty string *''*
> > > > >
> > > > > Both are in essence specially markers to indicate that the segment
> is
> > > not
> > > > > found in deep store and servers have to download them from peer
> > > servers.
> > > > > (1) has the benefit of better readability than (2) for debugging
> > > > purposes.
> > > > >
> > > > > Please let me know what you think.
> > > > >
> > > > > Ting Chen
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to