ctubbsii commented on PR #4468: URL: https://github.com/apache/accumulo/pull/4468#issuecomment-2075536752
> I would argue you could also call this a bug as the grep command (and iterator) search every part of the key which is a byte array EXCEPT the visibility. Can you point me to documentation which specifies that the visibility is a special part of the key that should be treated differently in some cases? The data model page makes no such note. Each piece of the key are inherently unique and special in the data model, but it's also flexible and you can certainly treat it like other fields, such as in your use case. My point wasn't that visibility must be treated special, but rather that it's also a valid use case if it is. The basis for my objection wasn't to preserve the existing use case, but to show that it's not necessarily a bug because both use cases are valid. Treating both use cases as valid, then this change will break some valid existing workflows. The severity of that breakage could range anywhere from "harmless surprise", to "security-compromise", depending on the additional data returned. I think we can satisfy both use cases, while avoiding making it a breaking change. > On top of that, this PR came out of observing a user of accumulo frustrated as the grep they were running was not matching, when they expected that it would. Seems to me that the logical behavior is that it greps for everything, not that it excludes some parts. Is the main frustration the use of grep in the shell, or is it the GrepIterator itself? I'm okay changing the behavior in the shell, which primarily is intended to support user-interactive ad-hoc workflows anyway. My concern is changing the default behavior of the GrepIterator when used directly in table configuration or as a scan-time iterator. A few things we could do to avoid breaking changes to the GrepIterator: 1. Create a new iterator with new behavior (this gives the possibility for deprecating the old one, if we want to only have one, and wish to signal the change in behavior), or 2. Add a matchOnVisibility option to existing iterator that is off by default (my preference). For the shell, where we don't care about changing behavior as much: 1. Update the grep command in the shell to use the new iterator or new option and update the documentation (also, highlight the change in behavior in the release notes), and 2. Maybe also add an option to toggle between the behaviors in the shell (I don't think this is necessary... changing the behavior in the shell is less a concern for me than breaking direct uses of the GrepIterator). Changing the behavior of the shell's grep command to make it more convenient can be done without changing the behavior of GrepIterator for existing workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
