Jackie-Jiang commented on code in PR #18368:
URL: https://github.com/apache/pinot/pull/18368#discussion_r3283583495
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/ComplexFieldSpec.java:
##########
@@ -58,17 +61,45 @@ public final class ComplexFieldSpec extends FieldSpec {
private final Map<String, FieldSpec> _childFieldSpecs;
- // Default constructor required by JSON de-serializer
- public ComplexFieldSpec() {
+ @Nullable
+ private final Map<String, FieldSpec> _valueFieldSpecs;
Review Comment:
What's the difference between this and `_childFieldSpecs`?
Seems we are overloading the MAP concept. For MAP, by definition all keys
and values must be of the same type. What we are adding is a new
semi-structured type where keys are always string, and values can be of any
supported type
##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/datasource/MapDataSource.java:
##########
@@ -33,6 +33,26 @@ public interface MapDataSource extends DataSource {
/// Returns the DataSource for the given map key's values.
DataSource getDataSource(String key);
+ /// Returns whether this segment has per-key index data for the given key.
+ ///
+ /// - **All-dense columnar segments:** exact (O(1) lookup into the
materialized key set).
+ /// - **Mixed-tier columnar segments (dense + sparse):** exact for
materialized keys; for any
+ /// non-materialized key, returns `true` whenever a sparse blob exists
(conservative — the
+ /// sparse key set is not persisted today). Cost of the conservative
answer is bounded:
+ /// queries on truly-absent keys pay one wasted `getDataSource(key)`
lookup, then fall
+ /// through to expression-filter (which returns no matches via a full
scan).
+ /// - **Blob-only segments:** `true` conservatively because determining key
presence requires
+ /// per-doc deserialization.
+ ///
+ /// Query operators use this to choose between fast-path (per-key
inverted/dictionary index)
+ /// and fallback (expression scan). When this returns `false`, callers can
short-circuit:
+ /// e.g. `MapFilterOperator` returns `EmptyFilterOperator` for value
predicates and
+ /// `MatchAllFilterOperator` for IS_NULL. When this returns `true`, callers
should
+ /// still handle absent-key DataSources gracefully (null-value bitmap marks
all rows as null).
+ default boolean containsKey(String key) {
Review Comment:
Per the javadoc, does this return false only when all keys are materialized
and doesn't contain the given key?
I still feel the definition of this API can be confusing. Does it work if we
split it into:
- boolean isMaterialized(String key): always O(1) lookup for the
materialized keys
- boolean isFullyMaterialized(): returns `true` when all keys materialized
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]