Apache Pinot Daily Email Digest (2020-10-23)

Pinot Slack Email Digest Fri, 23 Oct 2020 19:01:07 -0700

#general

@pranaybsankpal2050: @pranaybsankpal2050 has joined the channel
@rhafik.gonzalez: @rhafik.gonzalez has joined the channel
@joao.comini: @joao.comini has joined the channel
@dprothro: @dprothro has joined the channel
@dprothro: hi all, first time here :slightly_smiling_face: i have a question about segments in a realtime table. i want to repopulate our tables, so i used the rest API to delete all segments in our table, and then pushed a bunch of stuff into our kafka topic. after deleting the segments, no new data is showing up and no new segments are being created. is there something else i need to do for new segments to be created?
@dprothro: interesting, reloading the segments appears to have fixed things
@ssubrama: Did you use the recently added truncate API to remove segments?
@dprothro: no, i don't think we're on a version that has them
@ssubrama: So, you probably tried to delete the consuming segment. i am unsure how that works. In general, if you want to remove all data from a realtime table, then just remove the table and recreate it. You will have to wait a bit before recreating the table with the same name. The deletion of segments happen in the background, and Helix may take some time to reflect that in all servers.
@dprothro: ah interesting. yes, i deleted all segments, so it would have included the consuming segment
@chundong.wang: Any recommendation to do rolling aggregation (eg movingAvg of past 7 days for each hour of last 24 hours) efficiently inside Pinot?
@jackie.jxt: Pinot does not support this kind of query natively. The most efficient way is to query the per-hour sum and count for the past 8 days, and then calculate the rolling aggregation on the client side. Another way (less efficient) is to send a separate query for each hour with a filter to query the data for the last 24 hours
@chundong.wang: Thanks. That’s what I was thinking. The other option would be to do it upstream, but then you’d lose the flexibility that Pinot brought
@chundong.wang: I’m also wondering if the separate query option would create an unnecessary spike of query amount
@jackie.jxt: Separate query option is definitely much more expensive than the one query approach with some client side post-aggregation, so for performance perspective, I would recommend the one query approach
@jackie.jxt: As for the the upstream approach, if this query pattern is a very commonly used for your use case, you might also consider pre-calculate the rolling aggregation and store it as a separate column
@jackie.jxt: Then you can still benefit from the arbitrary slice and dice provided by Pinot
@g.kishore: If you are brave enough to try, you can write a post aggregation transform function to do this on Pinot side
@g.kishore: Basically what ever you are doing on client side, can be done on broker side
@g.kishore: We will be happy to help
@ipolyzos.se: @ipolyzos.se has joined the channel
@alex.odle: @alex.odle has joined the channel

#random

#troubleshooting

@dprothro: @dprothro has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2020-10-23)

#general

#random

#troubleshooting

Reply via email to