I mean I am okay with adding it as an external module for the extra clarification :-)
2021년 2월 9일 (화) 오후 11:10, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: > I'm good with this too. > > 2021년 2월 9일 (화) 오후 4:16, DB Tsai <dbt...@dbtsai.com>님이 작성: > >> +1 to add it as an external module so people can test it out and give >> feedback easier. >> >> On Mon, Feb 8, 2021 at 10:22 PM Gabor Somogyi <gabor.g.somo...@gmail.com> >> wrote: >> > >> > +1 adding it any way. >> > >> > On Mon, 8 Feb 2021, 21:54 Holden Karau, <hol...@pigscanfly.ca> wrote: >> >> >> >> +1 for an external module. >> >> >> >> On Mon, Feb 8, 2021 at 11:51 AM Cheng Su <chen...@fb.com.invalid> >> wrote: >> >>> >> >>> +1 for (2) adding to external module. >> >>> >> >>> I think this feature is useful and popular in practice, and option 2 >> is not conflict with previous concern for dependency. >> >>> >> >>> >> >>> >> >>> Thanks, >> >>> >> >>> Cheng Su >> >>> >> >>> >> >>> >> >>> From: Dongjoon Hyun <dongjoon.h...@gmail.com> >> >>> Date: Monday, February 8, 2021 at 10:39 AM >> >>> To: Jacek Laskowski <ja...@japila.pl> >> >>> Cc: Liang-Chi Hsieh <vii...@gmail.com>, dev <dev@spark.apache.org> >> >>> Subject: Re: [DISCUSS] Add RocksDB StateStore >> >>> >> >>> >> >>> >> >>> Thank you, Liang-chi and all. >> >>> >> >>> >> >>> >> >>> +1 for (2) external module design because it can deliver the new >> feature in a safe way. >> >>> >> >>> >> >>> >> >>> Bests, >> >>> >> >>> Dongjoon >> >>> >> >>> >> >>> >> >>> On Mon, Feb 8, 2021 at 9:00 AM Jacek Laskowski <ja...@japila.pl> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> >> >>> >> >>> I'm "okay to add RocksDB StateStore as external module". See no >> reason not to. >> >>> >> >>> >> >>> Pozdrawiam, >> >>> >> >>> Jacek Laskowski >> >>> >> >>> ---- >> >>> >> >>> https://about.me/JacekLaskowski >> >>> >> >>> "The Internals Of" Online Books >> >>> >> >>> Follow me on https://twitter.com/jaceklaskowski >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Tue, Feb 2, 2021 at 9:32 AM Liang-Chi Hsieh <vii...@gmail.com> >> wrote: >> >>> >> >>> Hi devs, >> >>> >> >>> In Spark structured streaming, we need state store for state >> management for >> >>> stateful operators such streaming aggregates, joins, etc. We have one >> and >> >>> only one state store implementation now. It is in-memory hashmap >> which was >> >>> backed up in HDFS complaint file system at the end of every >> micro-batch. >> >>> >> >>> As it basically uses in-memory map to store states, memory >> consumption is a >> >>> serious issue and state store size is limited by the size of the >> executor >> >>> memory. Moreover, state store using more memory means it may impact >> the >> >>> performance of task execution that requires memory too. >> >>> >> >>> Internally we see more streaming applications that requires large >> state in >> >>> stateful operations. For such requirements, we need a StateStore not >> rely on >> >>> memory to store states. >> >>> >> >>> This seems to be also true externally as several other major streaming >> >>> frameworks already use RocksDB for state management. RocksDB is an >> embedded >> >>> DB and streaming engines can use it to store state instead of memory >> >>> storage. >> >>> >> >>> So seems to me, it is proven to be good choice for large state usage. >> But >> >>> Spark SS still lacks of a built-in state store for the requirement. >> >>> >> >>> Previously there was one attempt SPARK-28120 to add RocksDB >> StateStore into >> >>> Spark SS. IIUC, it was pushed back due to two concerns: extra code >> >>> maintenance cost and it introduces RocksDB dependency. >> >>> >> >>> For the first concern, as more users require to use the feature, it >> should >> >>> be highly used code in SS and more developers will look at it. For >> second >> >>> one, we propose (SPARK-34198) to add it as an external module to >> relieve the >> >>> dependency concern. >> >>> >> >>> Because it was pushed back previously, I'm going to raise this >> discussion to >> >>> know what people think about it now, in advance of submitting any >> code. >> >>> >> >>> I think there might be some possible opinions: >> >>> >> >>> 1. okay to add RocksDB StateStore into sql core module >> >>> 2. not okay for 1, but okay to add RocksDB StateStore as external >> module >> >>> 3. either 1 or 2 is okay >> >>> 4. not okay to add RocksDB StateStore, no matter into sql core or as >> >>> external module >> >>> >> >>> Please let us know if you have some thoughts. >> >>> >> >>> Thank you. >> >>> >> >>> Liang-Chi Hsieh >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> >> >> >> >> >> -- >> >> Twitter: https://twitter.com/holdenkarau >> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 >> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >> >> >> -- >> Sincerely, >> >> DB Tsai >> ---------------------------------------------------------- >> Web: https://www.dbtsai.com >> PGP Key ID: 42E5B25A8F7A82C1 >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>