gortiz commented on code in PR #10184:
URL: https://github.com/apache/pinot/pull/10184#discussion_r1151623764


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -102,6 +118,8 @@ public class IndexLoadingConfig {
 
   private String _instanceId;
   private Map<String, Map<String, String>> _instanceTierConfigs;
+  private boolean _dirty = true;

Review Comment:
   This flag is:
   * Set to true each time any attribute that is a collection of columns is 
modified. 
   * Set to false each time `refreshIndexConfigsByColName` is called
   * Checked each time `getFieldIndexConfig` or `getFieldIndexConfigByColName` 
is called.
   
   As described in the 
[PEP](https://docs.google.com/document/d/1L_GJ2tAgZVxI_Ssa59--Mso3mK_HeAS7s7G9261-sOE/edit#heading=h.j8v48t5xzh4),
 this class is heavily (mis)used in tests. These tests modify this class by 
calling its set/add/remove methods instead of actually changing the schema. The 
new code uses the schema as the source of truths, so we need the 
`ConfigurableFromIndexLoadingConfig` infrastructure in order to create the new 
config objects from this artificial (I would even said _mocked_) values.
   
   Ideally we would change the tests to do not modify `IndexLoadingConfig` or 
even better, we would remove this class. But that isn't in the scope yet. 
Therefore, we have three options:
   1. Either calculate `Map<String, FieldIndexConfigs> _indexConfigsByColName` 
each time the whenever the already modified code 
(`PhysicalColumnIndexContainer`, `BaseIndexHandlercalls`, etc) ask for the 
`FieldIndexConfig` by calling `getFieldIndexConfig` or 
`getFieldIndexConfigByColName` 
   2. Eagerly modify `_indexConfigsByColName` each time a column is 
added/removed to each collection attribute.
   3. Or have this `_dirty` flag and lazily update `_indexConfigsByColName` 
when needed.
   
   Indexing loading config opts for the later option, while 
`SegmentGeneratorConfig` opts for the second. I like the second more. It is 
more expensive in terms of memory and CPU, but it is only used in test, so it 
shouldn't be a actual problem. But `IndexLoadingConfig` interface is not the 
same as `SegmentGeneratorConfig`. While in `SergmentGeneratorConfig` the 
changes are usually "enable this index", in `IndexLoadingConfig` api the 
changes are usually "act like TableConfig had this text in this specific 
field". It may be the case that a test temporally sets `IndexLoadingConfig` in 
a invalid state, so if we build the index config on each modification we may 
throw exceptions unnecessarily. As said in another comment, @Jackie-Jiang and I 
want to actually remove both `IndexLoadingConfig` and `SegmentGeneratorConfig` 
classes, so I didn't dedicate that much time trying to change this class into 
something closer to `SegmentGeneratorConfig`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to