#general


@yupeng: Please check out our recent blog on how we operate pinot at Uber scale. We are glad to share our learnings with the community.
@brijoobopanna: @brijoobopanna has joined the channel
@babak: @babak has joined the channel
@bharadwaj.r07: @bharadwaj.r07 has joined the channel

#random


@brijoobopanna: @brijoobopanna has joined the channel
@babak: @babak has joined the channel
@bharadwaj.r07: @bharadwaj.r07 has joined the channel

#troubleshooting


@tanmay.movva: Hello, I am trying to setup s3 as segment store for pinot, which is deployed on kubernetes. Unfortunately it is a cross account bucket and we have to pass bucket ACL also. I couldn’t find any way to pass acl policy in the docs. Can anyone please help me with this?
  @fx19880617: You can try to set it in controller/server config like: ```pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.storage.factory.s3.accessKey=AKIARC********** pinot.server.storage.factory.s3.secretKey=aaaaaaaaaaaa```
  @fx19880617: similar in controller: ```pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.storage.factory.s3.accessKey=AKIARC********** pinot.controller.storage.factory.s3.secretKey=aaaaaaaaaaaa pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher```
  @tanmay.movva: I did get that part. But we have to provide acl policy for the s3 bucket so that pinot is able to write in that bucket. I am looking for something similar to `druid.storage.disableAcl` in druid. Ref - It’s implementation can be found here -
  @tanmay.movva: I have already set the required configs for s3. Thanks for your quick reply @fx19880617!
  @tanmay.movva: But what I need is to tell pinot to set `bucket-owner-full-control` as the acl.
  @fx19880617: let me take a look that
  @g.kishore:
  @g.kishore: we might have to change this code to setup the s3clientbuilder
  @g.kishore: @pradeepgv42 what do you think?
  @pradeepgv42: @tanmay.movva I think S3PinotFS needs to be updated it seems, currently that option is missing something similar to what you pointed out here should work I believe for all the PutObjectRequests
  @fx19880617: can we try to expose those options transparently?
  @pradeepgv42: code is missing too
  @g.kishore: I think Xiang is suggesting if there is a way we can pass all the properties from pinot.controller.segment.fetcher.s3.** transparently to the s3clientbuilder, this will solve the problem of having to change the code everytime a new property needs to be set in S3Client
  @pradeepgv42: Seems like this acl property need to be setup for each upload/copy (PutObjectRequest & CopyObjectRequest) of any file on S3, so not sure we can achieve that with just properties without code change.
  @g.kishore: I see
  @pradeepgv42: Code change should be simple, whereever there is CopyObjectRequest or PutObjectRequest and when the config is turned on, set acls
  @fx19880617: got it. Created an issue: , Could you fill up more info there ?
  @pradeepgv42: done
@venkatesan.v: @venkatesan.v has joined the channel

#metadata-push-api


@fx19880617: Hi, wanna bring this up, from the code perspective, seems that for new segment add, we don’t have any synchronization on idealstates updater
@mayanks: What do you mean?
@fx19880617: so if user tries to upload with high parallism then very likely the idealstate update will fail
@fx19880617: say user pushes 20k segments with 100 threads in parallel
@fx19880617:
@mayanks: Isn't there a retry policy?
@fx19880617: currently they can only achieve about 4 as push parallelism
@fx19880617: this is considering retry
@fx19880617: otherwise parallelism =2 may cause the issue
@mayanks: I mean there was a retry policy in updating zk
@fx19880617: yes
@mayanks: could you describe the race condition?
@fx19880617: this is already considered
@fx19880617: it’s not race condition, it’s just many threads are trying to update zk
@fx19880617: so the version bumped and the request is not succeed
@mayanks: So they cannot finish in the specified number of retries?
@fx19880617: then it needs to retry again
@fx19880617: yes
@mayanks: what is the max number of retries
@fx19880617: ```private static final RetryPolicy DEFAULT_RETRY_POLICY = RetryPolicies.exponentialBackoffRetryPolicy(5, 1000L, 2.0f); ```
@mayanks: If we synchronized zk metadata update and IS update, will it help?
@fx19880617: 5 times
@mayanks: it should at least reduce the retry
@mayanks: actual one question
@mayanks: why do we need parallelism for metadata push/
@fx19880617: It will help in single server level but still in the case of pushing to all the controllers, we will still see the race
@mayanks: it should be fast anywyas, right
@fx19880617: for pushing 20k segments to bootstrap data, each segment upload took 4 seconds
@mayanks: 20k segments? What is the segment size?
@fx19880617: 200mb
@mayanks: ok
@fx19880617: it’s metadata push, so segment size doesn’t matter
@mayanks: yeah, so either we make IS + ZK update sequential (in which case no reason for parallel push), or we increase num retries
@fx19880617: just looking for a way to speed this up
@fx19880617: right
@mayanks: increasing num retries will put lot of load on ZK
@fx19880617: I think single controller level, we should put a table level sync
@mayanks: if 20k segments
@fx19880617: yes
@fx19880617: 3 controllers means 3 parallel updates on table idealstates
@fx19880617: I think it’s anyway much better than current implementation
@mayanks: I think that's how we use it at lnkd
@mayanks: we have a VIP with 3 controllers
@mayanks: so we put parallelism as 3 for big use cases
@fx19880617: yeah, I think for segment data push, there are enough leeway for idealstates update
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to