Apache Pinot Daily Email Digest (2022-03-03)

Pinot Slack Email Digest Thu, 03 Mar 2022 18:00:44 -0800

#general

@ayush.jha: hey everyone,This tiered storage thing sounds it available for azure blob or it is in the pipeline??
@mayanks: Hi Ayush, it is currently on S3, but we plan to extend to other blob stores as well.
@ayush.jha: Thanks, kind of excited for this
@prashant.pandey: @mayanks Just to clarify, this feature is not in the OSS version yet right?
@mayanks: Not at the moment.
@luisfernandez: will it make it to the OSS?
@mayanks: We are considering different options, but nothing concrete. Would love to understand what your needs are there.
@luisfernandez: def have bigger retention for our systems and just have pinot manage it all, cause we were thinking of querying hot data in pinot and as it aged go to other systems but if pinot can just manage everything then the burden in clients and repetition of logic won’t be as bad
@shadab.anwar: Hi just need a confirmation. When i created my tables, my tables did not have any data but segments were created. I checked my s3 and there was no segment uploaded. However, as soon as data arrived in my tables, I checked and saw that segments were then uploaded to S3. So, wanted to confirm if segments are uploaded only when it has some data ??
@mark.needham: Do you mean that you're using s3 as the deep store? If so then for real-time tables you'd only see segments in s3 once they have been completed/flushed. The consuming segment(s) live only on Pinot servers until they reach segment threshold. If it's an offline table then the job that you wrote to create and upload the segments would be putting those segments into s3. In either case you are correct that creating a table doesn't upload segments to s3.
@mayanks: Are you asking about the case where segments don’t have any data (no rows)? If so, I can double check but I think we did prevent pushing if empty segments a while back. You can also confirm it in controller log, that should indicate why the segment was not accepted
@glenn393: @glenn393 has joined the channel
@lakshmanan.velusamy: Hi Community, Can the timezone argument for DATETRUNC come from an another column in the table?
@lakshmanan.velusamy: Checked the datatruc transform function code, doesn't seem to support column as argument for timezone filed (expects optional )
@mayanks: Yeah. But seems like a good feature to have. Mind filing an issue?
@mike238: @mike238 has joined the channel

#random

@glenn393: @glenn393 has joined the channel
@srini: hey Pinot-ers! Pinot-ians? We launched a podcast at Preset :slightly_smiling_face: We hope you like the first episode!
@mike238: @mike238 has joined the channel

#troubleshooting

@glenn393: @glenn393 has joined the channel
@mike238: @mike238 has joined the channel

#getting-started

@bobby.richard: How does segment size impact offline tables? Does the offline segment ingestion job always create one segment regardless of the number of records, or is it smart enough to create multiple segments of optimal segment size?
@mark.needham: generally it creates one segment per source file from my understanding
@mayanks: As of now in the OSS yes 1 input file = 1 segment
@bobby.richard: So input files should be sized to create the desired pinot segment size for optimal query performance?
@mayanks: Yes
@luisfernandez: hey friends, one question, how can i modify the default limit to be all records on a query, or for a given query return all records, since the default is 10 based on this
@mayanks: You can specify a large value for `LIMIT`. However, note that if you issue an expensive query, it will put load on your cluster.
@luisfernandez: hmm i see i see
@aaron.weiss: Hey there, I have a several questions around StarTree indexes after playing around with them a bit. I appreciate the help! *StarTree (ST) index general question*: There's an FAQ comment that says: "The new segments will have star-tree indexes generated after applying the star-tree index configs to the table config. _Currently, Pinot does not support adding star-tree indexes to the existing segments._" But then there's the table config param, enableDynamicStarTreeCreation, which says: "Boolean to indicate whether to allow creating star-tree when server loads the segment. Star-tree creation could potentially consume a lot of system resources, so this config should be enabled when the servers have the free system resources to create the star-tree." So my question is, if you have an existing table without an ST index, and you want to add one, can you add it with enableDynamicStarTreeCreation=true and run Reload All Segments to enable the index on the entire table? *Default startree (ST) index questions*: Since configuration only includes single-value dimensions with cardinality <= 10k, - Does this mean that dimensions with >10k aren't great candidates to be in an ST index? - Based on above, do you recommend either creating separate ST index(es) for dimensions that have higher cardinality OR put them in another type of index like inverted? - What happens if you query filtering on two fields, one from ST index and one from inverted index, does it utilize both indexes or just pick the "best" index? - What happens with dimensions that have <= 10k cardinality today when I enable default ST index, but tomorrow grow to 11k? - Is there a way to see the configuration metadata for a default ST index? i.e. Can I view the dimension list in _dimensionsSplitOrder_?
@mayanks: ```So my question is, if you have an existing table without an ST index, and you want to add one, can you add it with enableDynamicStarTreeCreation=true and run Reload All Segments to enable the index on the entire table?```
@mayanks: Yes ^^. Seems the doc might be out of date
@mayanks: Can you point me to the part of doc that says: ```"The new segments will have star-tree indexes generated after applying the star-tree index configs to the table config. Currently, Pinot does not support adding star-tree indexes to the existing segments."```
@mayanks: ```1. More than cardinality, I'd say you want to configure your ST to optimize queries where you end up selecting large number of rows (eg > 10k or more). For those low selectivity queries, you can keep the dimensions that the query has in filter + group-by. 2. If query can use ST index to answer the query, it will be picked over other index. 3. Segment metadata.properties has the dimension split order.```

#pinot-docsrus

@amrish.k.lal: Hi, I have a PR for updating JSON querying doc to show usage of json_extract_scalar and json_match. Please see . Thanks.
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org