#general


@matety: @matety has joined the channel

#random


@matety: @matety has joined the channel
@karinwolok1: Hahaha

#troubleshooting


@ravibabu.chikkam: @ravibabu.chikkam has joined the channel

#onboarding


@ravibabu.chikkam: @ravibabu.chikkam has joined the channel

#community


@ravibabu.chikkam: @ravibabu.chikkam has joined the channel

#discuss-validation


@chinmay.cerebro: @snlee @mayanks wanted to check with you folks quickly before I open a PR. Someone pointed me to this reallly nice validation technique : (). This allows us to declaratively express our "ideal state" for things like table config. For eg: ```{ "$schema": "", "$id": "", "title": "Product", "description": "A product from Acme's catalog", "type": "object", "properties": { "tableName": { "description": "Name of the table", "type": "string" }, "tableType": { "description": "Type of the table", "type": "string" }, "quota": { "description": "Specifies quota for storage and queries per second", "type": "object", "properties": { "maxQueriesPerSecond" : {"type": "integer"}, "storage": {"type": "string"} } }, "routing": { "type": "object", "properties": { "segmentPrunerTypes" : { "type": "array", "items": { "type": "string", "enum": ["partition"] } }, "instanceSelectorType": {"type": "string", "enum": ["replicaGroup"]} } }, "segmentsConfig": { "type": "object", "properties": { "schemaName": {"type": "string"}, "timeColumnName": {"type": "string"}, "timeType": {"type": "string"}, "replication": {"type": "string"}, "retentionTimeUnit": {"type": "string", "enum": ["DAYS", "HOURS", "MINUTES", "SECONDS"]}, "retentionTimeValue": {"type": "string"}, "segmentPushFrequency": {"type": "string", "enum": ["HOURLY", "DAILY", "WEEKLY", "MONTHLY"]}, "segmentPushType": {"type": "string", "enum": ["APPEND", "REFRESH"]} } }, "tableIndexConfig": { "type": "object" }, "tenants": { "type": "object" }, "ingestionConfig": { "type": "object" }, "metadata": { "type": "object" } }, "required": [ "tableName", "tableType", "segmentsConfig", "tableIndexConfig" ] }```
  @ssubrama: Seems like a neat idea. It also gets easier to ensure that backward incompatible changes do not go through? (or, is that by manual review when the schema file is updated)?
  @chinmay.cerebro: that's manual review
  @ssubrama: Are there plugins to validate one value vs another? (e.g. if offline table then `replication` needs to be validated, otherwise `replicasPerPartition`)
  @chinmay.cerebro: I'm afraid not, this is mostly syntax, range check, enum checks etc...
  @chinmay.cerebro: we need other validations on top
  @chinmay.cerebro: eg: dependent config as you mentioned
  @chinmay.cerebro: or things like nodictionary columns and sorted index columns don't go together, things like that
@chinmay.cerebro: I think we should adopt this
@chinmay.cerebro: we obviously need more validations on top of this - but this will save us a lot of manual efforts
@chinmay.cerebro: thoughts ?
@chinmay.cerebro: I've also started this Google doc to capture all the validations we want with table config:
@chinmay.cerebro: please take a look at that as well
@mayanks: Thanks @chinmay.cerebro , will take a look

#segment-cold-storage


@noahprince8: Might be a little more difficult than I had originally imagined. There’s really two entry points to downloading a segment, `SegmentFetcherAndLoader` and `RealtimeTableDataManager` . Unifying those two seems like it may be difficult, as the realtime use case has some backup peer downloading.
@jackie.jxt: The peer downloading should be applicable to both offline and realtime (might not be the case right now)
@jackie.jxt: And all segment download should be handled within the same class
@noahprince8: Yeah, appears it is not handled that way now. The only way it knows the uri for the deep store download is from a realtime specific metadata class
@noahprince8: This bit of the codebase could use a refactor, but I’m not sure I have the time
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to