#general


@djspatoulas: @djspatoulas has joined the channel
@noahprince8: Does a segment always consist of `columns.psf, creation.meta, index_map, metadata.properties` ? I’m thinking for the s3 lazy loading, it might make sense to have separate caching settings for metadata vs `columns.psf`. Like you may want to eagerly load all or most of the metadata since it’s small and means segments can be eliminated quickly.
  @mayanks: Yes, all segments have these file. But these are not exposed as individual files. One issue I can think of with the approach is when a segment is refreshed, the cached metadata can get out of sync, and would need some sort of invalidation/reload.
  @noahprince8: How does a segment get refreshed? I thought the idea was that data is immutable?
  @noahprince8: And what do you mean they aren’t exposed as individual files? Do they get compressed at some point?
  @mayanks: Having said that, I do see some merit in eager loading of metadata, Perhaps it would make sense to write down the idea and check against cases to handle.
  @mayanks: As in, the interface doesn’t allow you to query a file from segment
  @noahprince8: Oh. The interface expects the full segment to be there?
  @mayanks: I mean there is no api grtColumnPsfFile()
  @noahprince8: Added it as a comment on the lazy loading issue. I think first we do lazy loading of the whole segment. Then add this as an optimization later.
  @mayanks: There is getSegmentMetadata() though
  @mayanks: Yeah, I think your idea is good. Just saying we need to think through to design the right apis, and ensure all cases handled
@ravibabu.chikkam: @ravibabu.chikkam has joined the channel
@noahprince8: Is it possible to pause kafka collection on a table, but not querying? Seems like ChangeTableState disable makes queries return empty as well as pausing kafka
  @ssubrama: That is not possible currently.
  @ssubrama: Although, we have had rquests for the feature
  @ssubrama: what is the use case for you?
  @noahprince8: Not really a production use case. I have a 100GB SSD and it’s about to pop, testing a large dataset locally.
  @noahprince8: Though I think certainly it could be useful in production. Maybe a producer goes haywire and we want to stop consuming that data and do a repair, while still leaving the table accessible
  @noahprince8: Is there an issue for this? I can create one
@g.kishore: Quick reminder about tomorrow's meetup at 5 PM PST. We have amazing talks lined up. @tingchen - Uber, @pradeepgv42 - Confluera @afilipchik @elon.azoulay from City Storage Systems.

#random


@djspatoulas: @djspatoulas has joined the channel
@ravibabu.chikkam: @ravibabu.chikkam has joined the channel

#troubleshooting


@nguyenhoanglam1990: @nguyenhoanglam1990 has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to