rtpsw commented on PR #34392: URL: https://github.com/apache/arrow/pull/34392#issuecomment-1532094481
> Can you give a bit more details on what about MemoStore maintenance is not handles correctly? You can put code in the PR too if that is easier to align with the code. It's has been a while since I looked at future asof code so a refresher is appreciated. It would take a good amount of detail to explain the interaction between the various moving parts of the code. I'd prefer to add comments to the code to explain. Would that work? > Can you remind me why do we cache the hash in the first place. What is the case that we want to hash the same batch more than once? Hashes are used when the key is not fixed-width (and fits within 64-bits), e.g., when there are multiple key columns, and the hash for each row of the key columns is used as the value for grouping. For performance, the code computes hashes for all rows of the key columns of a given input batch once and caches the result. Later, whenever `GetKey` is invoked, which happens frequently, the key hash is quickly found in the cache. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
