Thanks Xiaotian. My comments are inline. Some of these questions below seem
to be related to overall Deep Store by-pass design rather than this table
config change. I would provide brief answers below while directing most of
them to the cwiki design discussion.

On Mon, May 18, 2020 at 5:39 PM Xiaotian Jiang <jackie....@gmail.com> wrote:

> Seems we always use HTTP for inner cluster communication (e.g.
> TableSizeReader).
> We should make peer download the default behavior. If the segment
> download from deep storage failed, we should always try with the peer.
> For the race condition issue, we can add CRC into the peer download
> request (CRC can be obtained from the ZK metadata).
>
> The race condition pointed out by @mcvsubbu <mcvsu...@apache.org> is
mainly for offline segment cases -- we will not address it in this design
but leave it as a TODO. When you say "peer download the default behavior"
as the default, I suppose you mean after (1) we test this feature for
realtime tables and (2) coming up with a design for offline tables and then
test on offline tables too. If this is the case, this table config change
is the right step toward that direction.

The following questions are more related to overall design:

There are some parts not clear to me:
> - Can we always obtain the deep storage address before successfully
> uploading it?
> - Should we try to at least upload once before committing the segment?
>
With a new time-bounded and best effort segment uploader added recently (PR
5314 <https://github.com/apache/incubator-pinot/pull/5314>), the server
will wait for a configurable amount of time for the segment upload to
succeed before committing the segment. If the upload succeeds (within the
timeout period), a deep storage address will be returned.

- If we put downloadUrl before successfully uploading the segment, how
> to detect whether the upload is successful or not?
>
Similar to my answer to the previous question, with PR 5314
<https://github.com/apache/incubator-pinot/pull/5314> we will upload the
segment before commit. I would update the cwiki about this change. The main
factor driving this change is to utilize the SplitCommitter framework.


> - When we find segment not uploaded to the deep storage, who is
> responsible of uploading it, who is responsible of updating the
> downloadUrl (controller or server)?
>
There would be follow up PR
<https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion?focusedCommentId=152115613#By-passingdeep-storerequirementforRealtimesegmentcompletion-RealtimeValidationManagerchanges>
on RealtimeValidation manager to check and let servers to upload the
segment to the deep store. The controller will update the zk data.



>
> Jackie
>
> On Mon, May 18, 2020 at 4:13 PM TING CHEN <tingc...@uber.com> wrote:
> >
> >
> >
> > ---------- Forwarded message ---------
> > From: TING CHEN <tingc...@uber.com>
> > Date: Tue, May 5, 2020 at 5:16 PM
> > Subject: Adding a new field in SegmentsValidationAndRetentionConfig for
> peer segment download
> > To: <dev@pinot.apache.org>
> >
> >
> >
> > As part of the proposal to bypass deep store for segment completion, I
> plan to add a new optional string field peerSegmentDownloadScheme to the
> SegmentsValidationAndRetentionConfig in the TableConfig. The value can be
> http or https.
> >
> > SplitSegmentCommitter  will check this value. If it exists, the segment
> committer will be able to finish segment commit successfully even if the
> upload to the segment store fails. The committer will report a special
> marker to the controller about the segment is available in peer servers.
> > When Pinot servers fail to download segments from the segment store,
> they can also check this field's value. If it exists, it can download
> segments from peer servers using either HTTP and HTTPS segment fetchers as
> configured. (related PR in review for how to discover such servers.)
> >
> > Note this is a table level config. We will test the new download
> behavior in realtime tables in incremental fashion. Once fully proven, this
> config can be upgraded to server level config.
> >
> > Please let me know if you have any questions on this. Thanks @mcvsubbu
> for coming up with the idea and offline discussions.
> >
> > Ting Chen
> >
>

Reply via email to