One suggestion - high level

Put the time boundary in ZK (may be in /propertystore/table/routingInfo).

This can be updated via one of the following
- The upload job can set this via controller API after everything is pushed
- Controller can do this periodically
- Anything else

Brokers watch for this node and use the entry in routingInfo
- to come up with time boundary for APPEND use case
- use the version number for refresh use case to change to a newer version.

Append and Refresh use cases use different routing table provider
implementations.




On Thu, Mar 5, 2020 at 6:20 AM Mayank Shrivastava <[email protected]>
wrote:

> Today, we have an issue in Pinot, where data is in an inconsistent state
> during segment push, and the query results may be incorrect. This issue
> becomes more critical for enterprise applications to maintain customer
> trust, more so in case of REFRESH use cases with large data size, causing
> the period of inconsistency can be quite large. There are various flavors
> of this problem:
>
> 1. In APPEND use cases, the time-boundary is updated as soon as the first
> segment from the periodic push arrives. This causes queries to hit the
> offline table for period which does not have complete data in the offline
> table.
>
> 2. For REFRESH use cases, there is no requirement for segments to be
> partitioned, so data can be in an entirely inconsistent state during the
> push time.
>
> 3. We are seeing enterprise applications that create different
> denormalizations from source data(s) creating multiple tables in Pinot. In
> these cases, the same application queries multiple tables for their
> product. And there's increasing asks to ensure some sort of inter-table
> consistencies (provided client side takes care of synchronized data pushes
> to these tables).
>
> For 1 and 2. there are several potential bottlenecks that may increase the
> push time, including Pinot controller, deep-store and network b/w. For our
> cases, it seems that the biggest bottleneck is the network b/w between
> controller and compute farm that creates the segment.
>
> Next steps: Exchange ideas, and create proposals for the problems above.
>
> Cheers,
> Mayank
>

Reply via email to