crepererum commented on issue #1570: URL: https://github.com/apache/arrow-datafusion/issues/1570#issuecomment-1521895443
> Can we make the `GroupState` and the Accumulator states serializable ? With this approach, we do not need to do any sort when spiiling data to disks. And when we read the data back, we reconstruct our raw hash table quickly from the hash values and indexes, because our hashmap is very lightweight, the hash value can be re-calculated from grouping rows, or we can cache the hash value inside the `GroupState` to avoid the re-calculating. You still need to disk spilling, no? Or where do you store the serialized state? Also I guess that serialization may become a major bottleneck for some of the accumulators. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
