EdColeman commented on issue #2812:
URL: https://github.com/apache/accumulo/issues/2812#issuecomment-1188111321
Maybe I am reading you code wrong, but I don't see where your code is
functionally equivalent to the current code that you posted here.
The current code seems to be performing partial key comparisons - so for
example, if the case is `ROW` only the row data is compared. If it was
`ROW_COLFAM_COLQUAL` the fields ROW, COL_FAM and COL_QUAL are all checked - it
looks like you code only checks the last value as specified by the depth.
With your code, if the timestamps are equal, would it consider the rows
equal?
What I thought you were going for would use boolean operation
short-circuiting so that comparing "long" rows would be
deferred until later.
```
original:
...
case ROW_COLFAM_COLQUAL_COLVIS_TIME_DEL:
return isEqual(row, other.row) && isEqual(colFamily, other.colFamily)
&& isEqual(colQualifier, other.colQualifier)
&& isEqual(colVisibility, other.colVisibility) && timestamp ==
other.timestamp
&& deleted == other.deleted;
possible change:
...
case ROW_COLFAM_COLQUAL_COLVIS_TIME_DEL:
return deleted == other.deleted &&
&& timestamp == other.timestamp
&& isEqual(colVisibility, other.colVisibility)
&& isEqual(colQualifier, other.colQualifier)
&& isEqual(colFamily, other.colFamily)
&& isEqual(row, other.row)
```
However, even this may end up with additional comparisons and would depend
on the distribution of values within the table.
When comparing full depth, checking isDeleted and timestamp upfront seem to
be a good optimization.
When comparing a partial key, things are not so clear cut.
- Visibilities may or may not change frequently.
- If locality groups are used, then column family / column qualifiers will
be grouped together and then less likely to change from row to row.
- for changes between row(s), the last bytes are more likely to change the
fastest. if the rows are `AAAA` and `AAAB` starting from the end (when lengths
are equal) requires one comparison. Starting from the head it takes four
comparisons (assuming checking one character at a time, but holds for longer
values when greater than a long or an integer if grabbing multiple bytes at a
time)
Sorry if I misunderstand what you are showing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]