kbendick commented on pull request #2680: URL: https://github.com/apache/iceberg/pull/2680#issuecomment-987542250
> I agree with you that having RocksDB is a win, our stateful Flink jobs all use Rocksdb backend :) > > But I think we need to more clearly communicated the trade-off to the user. For example, RocksDB can be a magnitude slower than in-memory hash map even with very fast SSD, depending on the read/write pattern. > I agree with you that having RocksDB is a win, our stateful Flink jobs all use Rocksdb backend :) > > But I think we need to more clearly communicated the trade-off to the user. For example, RocksDB can be a magnitude slower than in-memory hash map even with very fast SSD, depending on the read/write pattern. For sure. No disagreement. But we're just maybe not their yet is all I mean. Having the ability to spill to disk is a lot of work - which @openinx and others have been doing a great job with. But even look at the age of this PR - it's moving along, but it's definitely a process. It's good to be aware of how the need to pass configurations will affect the rest of the codebase, as Ryan had mentioned was one of the bigger areas for concern. Ideally we can pass configs to rocksdb without too much disruption to the rest of the codebase. Once that has crossed it's goal, we can worry more about user experience when using RocksDB. And I'll happily drop most anything I'm doing to review most documentation PRs! And when the time comes, if you want to write a blogpost about using RocksDB, I'd happily review any drafts or be sure it's prominently displayed on the Apache Flink website 🙂 For now, let's focus on getting RocksDB or similar available to developers within Iceberg and then we can definitely focus on the user experience. It is definitely true that there are many situations where it makes less sense than remaining in-memory, but having the option sure is nice. But it's good to be thinking about end user experience always. It's definitely one of my biggest concerns with all things too. So many thanks for that. 😀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
