FelixYBW commented on issue #9674: URL: https://github.com/apache/incubator-gluten/issues/9674#issuecomment-2923449307
> Is it possibly because of data skew? It's because of data skew. All spill in velox are bucket (velox called it partition) based. Each bucket has 1 or more key. One key can't cross multiple bucket. So if data is seriously skew on one key, we will see OOM issue always. The scenario is that no matter how you increase the partition#, the spill always exists in one or more partition and the datasize doesn't decrease. Eventually you will see the OOM due to spill level reached. But no matter how you increase the spill level, the OOM is still there. But to this case, it's more specific that the skew key is null. The join needs to handle null key seperately since null key can't match any other key. But looks velox didn't considered about it. SMJ can pass in this case because we needn't cache null key's all data in memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
