I worked on some work about rocksdb multi-arch support and version upgrade on Kafka/Storm/Flink[1][2][3].To avoid these issues happened in spark again, I want to give some inputs in here about rocksdb version selection from multi-arch support view. Hope it helps.
The Rocksdb adds Arm64 support [4] since version 6.4.6, and also backports all Arm64 related commits to 5.18.4 and release a all platforms support version. So, from multi-arch support view, the better rocksdb version is the version since v6.4.6, or 5.X version is v5.18.4. [1] https://issues.apache.org/jira/browse/STORM-3599 [2] https://github.com/apache/kafka/pull/8284 [3] https://issues.apache.org/jira/browse/FLINK-13598 [4] https://github.com/facebook/rocksdb/pull/6250 Regards, Yikun Liang-Chi Hsieh <vii...@gmail.com> 于2021年2月2日周二 下午4:32写道: > Hi devs, > > In Spark structured streaming, we need state store for state management for > stateful operators such streaming aggregates, joins, etc. We have one and > only one state store implementation now. It is in-memory hashmap which was > backed up in HDFS complaint file system at the end of every micro-batch. > > As it basically uses in-memory map to store states, memory consumption is a > serious issue and state store size is limited by the size of the executor > memory. Moreover, state store using more memory means it may impact the > performance of task execution that requires memory too. > > Internally we see more streaming applications that requires large state in > stateful operations. For such requirements, we need a StateStore not rely > on > memory to store states. > > This seems to be also true externally as several other major streaming > frameworks already use RocksDB for state management. RocksDB is an embedded > DB and streaming engines can use it to store state instead of memory > storage. > > So seems to me, it is proven to be good choice for large state usage. But > Spark SS still lacks of a built-in state store for the requirement. > > Previously there was one attempt SPARK-28120 to add RocksDB StateStore into > Spark SS. IIUC, it was pushed back due to two concerns: extra code > maintenance cost and it introduces RocksDB dependency. > > For the first concern, as more users require to use the feature, it should > be highly used code in SS and more developers will look at it. For second > one, we propose (SPARK-34198) to add it as an external module to relieve > the > dependency concern. > > Because it was pushed back previously, I'm going to raise this discussion > to > know what people think about it now, in advance of submitting any code. > > I think there might be some possible opinions: > > 1. okay to add RocksDB StateStore into sql core module > 2. not okay for 1, but okay to add RocksDB StateStore as external module > 3. either 1 or 2 is okay > 4. not okay to add RocksDB StateStore, no matter into sql core or as > external module > > Please let us know if you have some thoughts. > > Thank you. > > Liang-Chi Hsieh > > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >