keith-turner commented on code in PR #6040:
URL: https://github.com/apache/accumulo/pull/6040#discussion_r2688543199


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java:
##########
@@ -243,6 +245,26 @@ private SortedKeyValueIterator<Key,Value> createIterator()
       } else {
         // Scan time iterator options were set, so need to merge those with 
pre-parsed table
         // iterator options.
+
+        // First ensure the set iterators do not conflict with the existing 
table iterators.
+        for (var scanParamIterInfo : scanParams.getSsiList()) {

Review Comment:
   This code is doing a lot of work and I suspect it could be problematic for 
small scans w/ a few iterators.
   
    * Parsed iterator configuration is unparsed and then parsed.
    * For each scan iterator it loops over all tablet iterators.  So this seems 
like O(M*N) type behavior.  The unparsing and parsing is done M*N times.
   
   The `ParsedIteratorConfig` class was created to cache parsed table config 
because it was observed (using profiling) for small scans that significant time 
was spent parsing the table config.  ParsedIteratorConfig is automatically 
cached per table and only recreated when table config changes, this avoid each 
scan having to do redundant work of parsing the properties.
   
   Avoiding the O(M*N) work and avoiding unparsing the data would probably make 
this much faster.  Not exactly sure how to solve this puzzle exactly, but I 
suspect the following refactor might help.
   
    1. Modify the validation code to work on parsed iterator configuration.  
Currently `checkIteratorConflicts` parses and validates all together.  Maybe it 
could be refactored to take `List<IteratorSetting>` instead of 
`Map<String,String> props`.  This may make the code easier to understand.
    2. With the above change checking for iterator conflicts would parse in one 
method and then check/validate in another method.
    3. If we had a `checkIteratorConflicts` method that took parsed config, 
then the scan code could call this directly with its existing parsed iterator 
config.  This would avoid unparsing and then reparsing the data.
   
   The above might be a good general improvement to the code, but not 
completely sure it solves the problem.  Also not sure if will completely solve 
the O(M*N) problem.
   
   Also curious if the validation could efficiently be done in the existing 
`IteratorConfigUtil.mergeIteratorConfig()` code, but not sure about that.  
Suspect having `checkIteratorConflicts` work on parsed config would make all of 
this code easier to understand and more efficient, so that may help answer 
questions like this.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to