kmozaid edited a comment on pull request #8224: URL: https://github.com/apache/pinot/pull/8224#issuecomment-1044436343
> Hi @kmozaid thanks for taking the time to make this contribution. Can you explain what problem this solves? Is it because you already have a partitioning and you want to maintain locality within partitions? Hi @richardstartin , We have a table where data is being ingested from multiple sources. (these multiple sources pushes data to same kafka topic). Data is kept for 5 days in realtime table and then moved offline table by minion task. We want to keep data from these sources in separate segments for offline table. There is a column which identifies the source. `BoundedColumnValue` partition function provides capability to keep data from different sources in respective partitioned segments. Later if we want to backfill the data of just one source, then we will be able to do so because we would know what are the segments for given source and replace them by backfill. The main use case is to be able to backfill data of particular source. This is also discussed in slack thread - https://apache-pinot.slack.com/archives/CDRCA57FC/p1643286670255700 <img width="982" alt="image" src="https://user-images.githubusercontent.com/8354145/154681296-cf689529-26fc-4d1f-a983-70e18db4f4bc.png"> Just a note regarding the image - The partitionId mentioned in image are source1, source2 and source 3 although they would be integer value based on the their position in configured `columnValues` attributes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
