walterddr commented on issue #10381: URL: https://github.com/apache/pinot/issues/10381#issuecomment-1753261690
> I don't understand these properties. I would expect that in V2 there is no nullability query property. The engine should treat nullability as any other type property. The same we don't have properties to treat strings as binaries, we should not have properties to enable/disable nullability. Based on my proposal above (config) section, v2 doesn't need any nullability query property. > I think we have a clear way to define null column nullability: to set that in the schema. For compatibility reasons, we also have a way to define table level nullability in table config. We can treat the latter as the default column level nullability. Currently the "table-level nullability" is not "this table contains null", but an orthogonal property "for all the columns, generate null-vectors in the segment". this chart should show what I meant: | Control | During Query | During Ingestion | | -------- | ------- | ------- | | Table-level Control | via `SET enableNullHandling=true;` | via `tableConfig.tableIndexConfig.nullHandlingEnabled` | | Column-level Control | via schema `isNullableField` (discussed here) | via schema `isNullableField` (not discussed here) | my argument: 1. the `tableConfig.tableIndexConfig.nullHandlingEnabled` should not be used during query time (do not couple them) 2. `isNullableField` in schema can be coupled in the future but i dont want to couple them here and now due to backward compatibility discussions > ## enableNullhandling > Why do we need that? Ideally the table config nullability will tell us the default nullability. Do we expect users changing nullability at runtime? Why? That is not a normal flag in other databases. > > Are you proposing this property in order to keep backward compatibility so if it is not enabled, V1 ignores column level nullability? My proposal is to do not have this. There is no backward compatibility issue given that column level nullability is a new feature. > > 1. If users do not define column level nullability in the schema, v1 will honor `tableConfig.getIndexingConfig().isNullHandlingEnabled `, which is backward compatible. IMO this is not a good idea, see my table above. we dont want to couple ingestion time config with query time config > 2. If users do define column level nullability, then they have changed the schema, so it is backward compatible to return different values. > > ## enableNullHandlingIfColumnIsNullable > In which situation we want this disabled? I mean, in which cases we expect users to define a column as nullable but want to skip that? It sounds super confusing to understand. this is what I meant in the config section above (sorry if it is not clear) - `enableNullHandlingIfColumnIsNullable` is a hidden query option auto-attached to the leaf stage of v2 only **v1** behavior: 1. if `enableNullhandling` is set, current v1 behavior 2. if `enableNullhandling` is not set, current v1 behavior 3. if `enableNullHandlingIfColumnIsNullable` is set, utilize column-level nullability (user wont see it when directly using v1) **v2** behavior: 1. always utilize column-level nullability, and auto attach `enableNullHandlingIfColumnIsNullable` option to leaf stage of v2. i am not 100% sure `enableNullHandlingIfColumnIsNullable` is needed. if not (e.g. we can return different results from v1 when `enableNullhandling` is not set but column nullability is set), then the behavior is even simpler: **v1** behavior: 1. if `enableNullhandling` is set, current v1 behavior 2. if `enableNullhandling` is not set, utilize column-level nullability <-- different result **v2** behavior: 1. always utilize column-level nullability -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
