reallocf commented on issue #5268: URL: https://github.com/apache/incubator-pinot/issues/5268#issuecomment-619147900
Good idea @kishoreg - having filtering be part of the table config does seem like it would make more sense, especially if we're defining them to being across multiple columns. Based on the table config definitions here: https://docs.pinot.apache.org/pinot-components/table do you think it's worth adding a new top-level `filter` field? Or would it be better to nest it somewhere? Will there be future ingestion-related configs that we could all group under an `ingestConfig` or something? Oh yeah, I hadn't thought about transient fields. Is there a way that we can infer transient fields so they don't have to be handled explicitly? Probably not, because we'd lose type information... I guess the other option would be to create a `transientFieldSpecs` list in the schema definition. These also could be used for other transform functions. But now we're blowing scope quite a bit (though good for future work! 😄). In order to keep this small and achievable, what do y'all think I should start with @npawar @kishoreg ? 1) Pre-filter/post-filter/both? I have a feeling pre-filter might be more in line with how transform functions work today, so maybe start there and we can add post-filtering in the future? 2) Define in table config or schema definition? If defining in table config, what top-level field name should we have it under? If defined in the schema definition, should it be column-based or top-level? I think my preference here is for table config, but would love to hear @npawar 's thoughts on that 3) Only handle added fieldSpecs for now? Or also handle transient fieldSpecs? My preference would be for the former, though handling transient fieldSpecs more broadly does sound interesting. 4) Should we have a list of filter expressions, or a single expression? Also, does it sound like something along these lines would be useful for LinkedIn/your company @kishoreg ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
