ctubbsii commented on PR #4468:
URL: https://github.com/apache/accumulo/pull/4468#issuecomment-2075536752

   > I would argue you could also call this a bug as the grep command (and 
iterator) search every part of the key which is a byte array EXCEPT the 
visibility. Can you point me to documentation which specifies that the 
visibility is a special part of the key that should be treated differently in 
some cases? The data model page makes no such note.
   
   Each piece of the key are inherently unique and special in the data model, 
but it's also flexible and you can certainly treat it like other fields, such 
as in your use case. My point wasn't that visibility must be treated special, 
but rather that it's also a valid use case if it is. The basis for my objection 
wasn't to preserve the existing use case, but to show that it's not necessarily 
a bug because both use cases are valid.
   
   Treating both use cases as valid, then this change will break some valid 
existing workflows. The severity of that breakage could range anywhere from 
"harmless surprise", to "security-compromise", depending on the additional data 
returned. I think we can satisfy both use cases, while avoiding making it a 
breaking change.
   
   > On top of that, this PR came out of observing a user of accumulo 
frustrated as the grep they were running was not matching, when they expected 
that it would. Seems to me that the logical behavior is that it greps for 
everything, not that it excludes some parts.
   
   Is the main frustration the use of grep in the shell, or is it the 
GrepIterator itself? I'm okay changing the behavior in the shell, which 
primarily is intended to support user-interactive ad-hoc workflows anyway. My 
concern is changing the default behavior of the GrepIterator when used directly 
in table configuration or as a scan-time iterator.
   
   A few things we could do to avoid breaking changes to the GrepIterator:
   
   1. Create a new iterator with new behavior (this gives the possibility for 
deprecating the old one, if we want to only have one, and wish to signal the 
change in behavior), or
   2. Add a matchOnVisibility option to existing iterator that is off by 
default  (my preference).
   
   For the shell, where we don't care about changing behavior as much:
   
   1. Update the grep command in the shell to use the new iterator or new 
option and update the documentation (also, highlight the change in behavior in 
the release notes), and
   2. Maybe also add an option to toggle between the behaviors in the shell (I 
don't think this is necessary... changing the behavior in the shell is less a 
concern for me than breaking direct uses of the GrepIterator).
   
   Changing the behavior of the shell's grep command to make it more convenient 
can be done without changing the behavior of GrepIterator for existing 
workflows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to