gortiz commented on code in PR #10184:
URL: https://github.com/apache/pinot/pull/10184#discussion_r1151623764
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -102,6 +118,8 @@ public class IndexLoadingConfig {
private String _instanceId;
private Map<String, Map<String, String>> _instanceTierConfigs;
+ private boolean _dirty = true;
Review Comment:
This flag is:
* Set to true each time any attribute that is a collection of columns is
modified.
* Set to false each time `refreshIndexConfigsByColName` is called
* Checked each time `getFieldIndexConfig` or `getFieldIndexConfigByColName`
is called.
As described in the
[PEP](https://docs.google.com/document/d/1L_GJ2tAgZVxI_Ssa59--Mso3mK_HeAS7s7G9261-sOE/edit#heading=h.j8v48t5xzh4),
this class is heavily (mis)used in tests. These tests modify this class by
calling its set/add/remove methods instead of actually changing the schema. The
new code uses the schema as the source of truths, so we need the
`ConfigurableFromIndexLoadingConfig` infrastructure in order to create the new
config objects from this artificial (I would even said _mocked_) values.
Ideally we would change the tests to do not modify `IndexLoadingConfig` or
even better, we would remove this class. But that isn't in the scope yet.
Therefore, we have three options:
1. Either calculate `Map<String, FieldIndexConfigs> _indexConfigsByColName`
each time the whenever the already modified code
(`PhysicalColumnIndexContainer`, `BaseIndexHandlercalls`, etc) ask for the
`FieldIndexConfig` by calling `getFieldIndexConfig` or
`getFieldIndexConfigByColName`
2. Eagerly modify `_indexConfigsByColName` each time a column is
added/removed to each collection attribute.
3. Or have this `_dirty` flag and lazily update `_indexConfigsByColName`
when needed.
Indexing loading config opts for the later option, while
`SegmentGeneratorConfig` opts for the second. I like the second more. It is
more expensive in terms of memory and CPU, but it is only used in test, so it
shouldn't be a actual problem. But `IndexLoadingConfig` interface is not the
same as `SegmentGeneratorConfig`. While in `SergmentGeneratorConfig` the
changes are usually "enable this index", in `IndexLoadingConfig` api the
changes are usually "act like TableConfig had this text in this specific
field". It may be the case that a test temporally sets `IndexLoadingConfig` in
a invalid state, so if we build the index config on each modification we may
throw exceptions unnecessarily. As said in another comment, @Jackie-Jiang and I
want to actually remove both `IndexLoadingConfig` and `SegmentGeneratorConfig`
classes, so I didn't dedicate that much time trying to change this class into
something closer to `SegmentGeneratorConfig`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]