felipecrv commented on PR #43256: URL: https://github.com/apache/arrow/pull/43256#issuecomment-2244138609
> We change SetLookupState::Init so that it takes the SetLookupOptions& argument without const. When initializing, we sort and remove duplicates from the value set and update the options in place. This effectively makes it so that binding an expression can mutate the input in a way that doesn't change semantics. It's probably a bad idea to mutate `SetLookupOptions`. I would simply stored the simplified/equivalent `value_set` your simplification builds in the `KernelState` itself. You don't have to do it in `SetLookupState::Init`. You can mutate it after the `Init` as an optional pre-processing step that you do to help the kernel "execute better". > Sorting the value set inside SetLookupState would change the behavior of index_in. Hence, we'd also need to modify InitSetLookup to plumb through whether we are initializing is_in or index_in. Yeah, don't mutate `SetLookupState`. > Inside Simplify, if we see an is_in call that is unbound, then we immediately fail simplification, since an unbound is_in is not guaranteed to have a sorted and unique value set. We cannot bind the expression on the spot since we don't have access to a schema. I don't understand this. > Once we compute the simplified value set, we must avoid re-binding so that we don't re-sorted the value set (and re-construct the memo table). We only re-bind once after all inequalities in the guarantee have been processed. Yeah. You should be able to look at the `KernelState` and decide how to proceed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
