Apache Pinot Daily Email Digest (2020-10-28)

Pinot Slack Email Digest Wed, 28 Oct 2020 19:00:49 -0700

#general

@djspatoulas: @djspatoulas has joined the channel
@noahprince8: Does a segment always consist of `columns.psf, creation.meta, index_map, metadata.properties` ? I’m thinking for the s3 lazy loading, it might make sense to have separate caching settings for metadata vs `columns.psf`. Like you may want to eagerly load all or most of the metadata since it’s small and means segments can be eliminated quickly.
@mayanks: Yes, all segments have these file. But these are not exposed as individual files. One issue I can think of with the approach is when a segment is refreshed, the cached metadata can get out of sync, and would need some sort of invalidation/reload.
@noahprince8: How does a segment get refreshed? I thought the idea was that data is immutable?
@noahprince8: And what do you mean they aren’t exposed as individual files? Do they get compressed at some point?
@mayanks: Having said that, I do see some merit in eager loading of metadata, Perhaps it would make sense to write down the idea and check against cases to handle.
@mayanks: As in, the interface doesn’t allow you to query a file from segment
@noahprince8: Oh. The interface expects the full segment to be there?
@mayanks: I mean there is no api grtColumnPsfFile()
@noahprince8: Added it as a comment on the lazy loading issue. I think first we do lazy loading of the whole segment. Then add this as an optimization later.
@mayanks: There is getSegmentMetadata() though
@mayanks: Yeah, I think your idea is good. Just saying we need to think through to design the right apis, and ensure all cases handled
@ravibabu.chikkam: @ravibabu.chikkam has joined the channel
@noahprince8: Is it possible to pause kafka collection on a table, but not querying? Seems like ChangeTableState disable makes queries return empty as well as pausing kafka
@ssubrama: That is not possible currently.
@ssubrama: Although, we have had rquests for the feature
@ssubrama: what is the use case for you?
@noahprince8: Not really a production use case. I have a 100GB SSD and it’s about to pop, testing a large dataset locally.
@noahprince8: Though I think certainly it could be useful in production. Maybe a producer goes haywire and we want to stop consuming that data and do a repair, while still leaving the table accessible
@noahprince8: Is there an issue for this? I can create one
@g.kishore: Quick reminder about tomorrow's meetup at 5 PM PST. We have amazing talks lined up. @tingchen - Uber, @pradeepgv42 - Confluera @afilipchik @elon.azoulay from City Storage Systems.

#random

@djspatoulas: @djspatoulas has joined the channel
@ravibabu.chikkam: @ravibabu.chikkam has joined the channel

#troubleshooting

@nguyenhoanglam1990: @nguyenhoanglam1990 has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2020-10-28)

#general

#random

#troubleshooting

Reply via email to