One suggestion - high level Put the time boundary in ZK (may be in /propertystore/table/routingInfo).
This can be updated via one of the following - The upload job can set this via controller API after everything is pushed - Controller can do this periodically - Anything else Brokers watch for this node and use the entry in routingInfo - to come up with time boundary for APPEND use case - use the version number for refresh use case to change to a newer version. Append and Refresh use cases use different routing table provider implementations. On Thu, Mar 5, 2020 at 6:20 AM Mayank Shrivastava <[email protected]> wrote: > Today, we have an issue in Pinot, where data is in an inconsistent state > during segment push, and the query results may be incorrect. This issue > becomes more critical for enterprise applications to maintain customer > trust, more so in case of REFRESH use cases with large data size, causing > the period of inconsistency can be quite large. There are various flavors > of this problem: > > 1. In APPEND use cases, the time-boundary is updated as soon as the first > segment from the periodic push arrives. This causes queries to hit the > offline table for period which does not have complete data in the offline > table. > > 2. For REFRESH use cases, there is no requirement for segments to be > partitioned, so data can be in an entirely inconsistent state during the > push time. > > 3. We are seeing enterprise applications that create different > denormalizations from source data(s) creating multiple tables in Pinot. In > these cases, the same application queries multiple tables for their > product. And there's increasing asks to ensure some sort of inter-table > consistencies (provided client side takes care of synchronized data pushes > to these tables). > > For 1 and 2. there are several potential bottlenecks that may increase the > push time, including Pinot controller, deep-store and network b/w. For our > cases, it seems that the biggest bottleneck is the network b/w between > controller and compute farm that creates the segment. > > Next steps: Exchange ideas, and create proposals for the problems above. > > Cheers, > Mayank >
