Apache Pinot Daily Email Digest (2020-10-19)

Pinot Slack Email Digest Mon, 19 Oct 2020 19:01:00 -0700

#general

@ashish.mnr: @ashish.mnr has joined the channel
@tanmay.movva: @tanmay.movva has joined the channel
@jrj: @jrj has joined the channel
@venkatesan.v: @venkatesan.v has joined the channel
@venkatesan.v: Hello. Am trying to explore pinot on kubernetes. Have a few questions: 1. The official helm chart marks the broker and controller as statefulset. Any reason i need to keep them as statefulset sets as opposed to just deployments? As in, is there some ordering needed? 2. The controller requires a PV to be attached for volume(ref: ). What is the nature of data being stored here? Or rather, what is the general recommendation for the disk size here? What purpose does this solve? 3. Broker doesn't look like it needs any specific disk. So, extending on 1 and 2 above, why does this need to be a statefulset? What ordering is needed here?
@dlavoie: Hello Venkatesan!
@mayanks: 1. We typically follow the ordering `Controller` -> `Broker` -> `Server` for deployments
@dlavoie: Well, Pinot is a database, so some components are stateful by nature.
@mayanks: 2. Controller stores a copy of all the data pushed to the Pinot cluster. So you need to size it accordingly
@venkatesan.v: @mayanks @dlavoie i understand the server being stateful since it stores the data. Just wondering why controller. I think mayank's response answers that. My question on broker still remains. As in, what changes if i make it a deployment instead of a statefulset? Is there a reason? This is largely out of curiosity, since the broker doesn't seem to store any physical data. Thanks for the prompt response :slightly_smiling_face:
@mayanks: Broker can be considered stateless to some degree. However, note that it is communicating with the server. So if there's a protocol change between server-broker, then deployment ordering matters.
@venkatesan.v: Sure...deployment ordering is understandable. My question was more around the kubernetes kind of `deployment` vs `statefulset`. I can still play with pod disruption budgets to ensure minimum replicas, right?
@dlavoie: stefulset will preserve host identity. With deployment, any host is unique and present itself as a new member.
@mailtobuchi: @venkatesan.v It's due to the way the identifiers of the brokers are used in ZK. K8s deployments doesn't guarantee a well-known name to the pod. Here, by running as statefulsets, we preserve hostname or identity
@venkatesan.v: I see, that makes sense.
@mailtobuchi: Ahh.. @dlavoie you beat me to it :slightly_smiling_face:
@dlavoie: Pinot controllers tracks who should own what. So identity is important :stuck_out_tongue:
@venkatesan.v: Thank you gentlemen. That was quite useful. Thanks
@dlavoie: @mailtobuchi At least we are not contradictory
@dlavoie: :rolling_on_the_floor_laughing:
@fx19880617: @venkatesan.v Broker is also stateful with the assignment for which server to connect to in multi-tenancy mode, hence we make them stateful, also less random garbage created on zookeeper
@venkatesan.v: Brilliant. Quite informative. Thank you
@mailtobuchi: @fx19880617 thanks for additional clarification. Got to know it for the first time.
@fx19880617: for 2, usually in non-cloud deployment we need to maintain a reference for pinot server to download the segments, so the only place to put it is the controller, or if there is an extra NAS setup, (which you can define as a volume there). In public cloud mode, we can have s3 and google cloud storage or azure datalake as backup, then that volume is not required.
@venkatesan.v: Got it. Thanks @fx19880617. yes we are on public cloud(AWS) and hence asked the above.
@fx19880617: cool, then you can set up s3 gcs as the backup for your segments and don't need to mount pvc for controller
@venkatesan.v: Yes. That is the plan as i was reading the deep storage aspects of pinot. Thanks once again.
@fx19880617: cool, please refer to this doc :
@chundong.wang: About scalar functions: could such functions be used only as transform function, or could be applied to the result of aggregation? Eg to return the greater value of two `PERCENTILETDIGEST50`
@g.kishore: yes, post aggregation transform functions are supported
@alex.hafner: @alex.hafner has joined the channel
@anjaiahspr: @anjaiahspr has joined the channel
@npawar: Taking a moment to pause and celebrate that the Apache Pinot community crossed 600 members this month! Big shoutout to everyone in the community!
@npawar: And, excited to announce our next virtual Apache Pinot meetup in exactly 10 days, hosted by the awesome folks at Uber! We have a superb speaker lineup consisting of folks from Uber, City Storage Systems and Confluera. RSVP :point_down:
@snlee: Do we have the documentation on how to use `map` column? I see `MAP_VALUE` function at However, I don’t see any any instruction on how I can configure the schema & store map value to segments.
@chinmay.cerebro: I think native map type support is not done yet. Current implementation includes treating two multi-valued columns as a map IIRC
@snlee: i see i just found the instructions here
@chinmay.cerebro: @g.kishore has the latest on native map type support
@snlee: what’s the plan on the native map type support? I guess that one of the existing approach’s limitation is that the map’s value type is fixed given that we use MV column for storing the map
@g.kishore: What do you mean by map value type is fixed?
@jackie.jxt: I think what @snlee means is that all the values of the map must be of the same type
@g.kishore: Oh, yes
@jackie.jxt: @snlee For the `MAP_VALUE` transform function, it works on 2 MV columns with one-to-one mapping on each element
@jackie.jxt: It takes 3 arguments, e.g. `MAP_VALUE(keyCol, 'lookupKey', valCol)`
@ssubrama: Also, I believe it has the limitation that you need to have all keys in all rows.
@jackie.jxt: You might need to add a filter on the key column to filter on the look up key first
@jackie.jxt: That's why we are working on a new type of map support now, and maybe replace this one with the new one
@g.kishore: > @ssubrama I believe it has the limitation that you need to have all keys in all rows. No, there is no such limitation
@npawar: btw, you dont have to use all the __KEYS and __VALUES convention anymore. You can use transform functions:
@kapil.surlaker: @kapil.surlaker has joined the channel

#random

@ashish.mnr: @ashish.mnr has joined the channel
@tanmay.movva: @tanmay.movva has joined the channel
@jrj: @jrj has joined the channel
@venkatesan.v: @venkatesan.v has joined the channel
@alex.hafner: @alex.hafner has joined the channel
@anjaiahspr: @anjaiahspr has joined the channel
@kapil.surlaker: @kapil.surlaker has joined the channel

#troubleshooting

@tanmay.movva: @tanmay.movva has joined the channel

#onboarding

@venkatesan.v: @venkatesan.v has joined the channel

#community

@tanmay.movva: @tanmay.movva has joined the channel

#announcements

@tanmay.movva: @tanmay.movva has joined the channel

#jdbc-connector

@kharekartik: ```Failed to execute query : SELECT baseballStats.playerName AS playerName FROM baseballStats GROUP BY baseballStats.playerName ORDER BY 1 ASC``` ``` ORDER By should be only on some/all of the columns passed as arguments to DISTINCT``` This query is failing. It is initiated by Tableau to get all the filter values for a particular filter
@kharekartik: Without it slice and dice won't be possible
@g.kishore: Xiang submitted a fix
@fx19880617: @kharekartik
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2020-10-19)

#general

#random

#troubleshooting

#onboarding

#community

#announcements

#jdbc-connector

Reply via email to