EdColeman commented on issue #2812:
URL: https://github.com/apache/accumulo/issues/2812#issuecomment-1188111321

   Maybe I am reading you code wrong, but I don't see where your code is 
functionally equivalent to the current code that you posted here.
   
   The current code seems to be performing partial key comparisons - so for 
example, if the case is `ROW` only the row data is compared.  If it was 
`ROW_COLFAM_COLQUAL` the fields ROW, COL_FAM and COL_QUAL are all checked - it 
looks like you code only checks the last value as specified by the depth.
   
   With your code, if the timestamps are equal, would it consider the rows 
equal?
   
   What I thought you were going for would use boolean operation 
short-circuiting so that comparing "long" rows would be 
   deferred until later.
   
   ```
   original: 
   ...
   case ROW_COLFAM_COLQUAL_COLVIS_TIME_DEL: 
    return isEqual(row, other.row) && isEqual(colFamily, other.colFamily) 
              && isEqual(colQualifier, other.colQualifier) 
              && isEqual(colVisibility, other.colVisibility) && timestamp == 
other.timestamp 
              && deleted == other.deleted; 
   
   possible change:
   ...
   case ROW_COLFAM_COLQUAL_COLVIS_TIME_DEL: 
    return deleted == other.deleted &&
              && timestamp == other.timestamp 
              && isEqual(colVisibility, other.colVisibility)
              && isEqual(colQualifier, other.colQualifier) 
              && isEqual(colFamily, other.colFamily) 
              && isEqual(row, other.row) 
   ```
   
   However, even this may end up with additional comparisons and would depend 
on the distribution of values within the table.
   
   When comparing full depth, checking isDeleted and timestamp upfront seem to 
be a good optimization.
   
   When comparing a partial key, things are not so clear cut.  
   
   - Visibilities may or may not change frequently.
   - If locality groups are used, then column family / column qualifiers will 
be grouped together and then less likely to change from row to row.
   - for changes between row(s), the last bytes are more likely to change the 
fastest. if the rows are `AAAA` and `AAAB` starting from the end (when lengths 
are equal) requires one comparison. Starting from the head it takes four 
comparisons (assuming checking one character at a time, but holds for longer 
values when greater than a long or an integer if grabbing multiple bytes at a 
time) 
   
   Sorry if I misunderstand what you are showing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to