I think adding RocksDB state store to sql/core directly would be
OK. Personally I also voted "either way is fine with me" against RocksDB
state store implementation in Spark ecosystem. The overall stance hasn't
changed, but I'd like to point out that the risk becomes quite lower than
before, given the fact we can leverage Databricks RocksDB state store
implementation.

I feel there were two major reasons to add RocksDB state store to external
module;

1. stability

Databricks RocksDB state store implementation has been supported for years,
it won't require more time to incubate. We may want to review thoughtfully
to ensure the open sourced proposal fits to the Apache Spark and still
retains stability, but this is quite better than the previous targets to
adopt which may not be tested in production for years.

That makes me think that we don't have to put it into external and consider
it as experimental.

2. dependency

>From Yuanjian's mail, JNI library is the only dependency, which seems fine
to add by default. We already have LevelDB as one of core dependencies and
don't concern too much about the JNI library dependency. Probably someone
might figure out that there are outstanding benefits on replacing LevelDB
with RocksDB and then RocksDB can even be the one of core dependencies.

On Tue, Apr 27, 2021 at 6:41 PM Yuanjian Li <xyliyuanj...@gmail.com> wrote:

> Hi all,
>
> Following the latest comments in SPARK-34198
> <https://issues.apache.org/jira/browse/SPARK-34198>, Databricks decided
> to donate the commercial implementation of the RocksDBStateStore. Compared
> with the original decision, there’s only one topic we want to raise again
> for discussion: can we directly add the RockDBStateStoreProvider in the
> sql/core module? This suggestion based on the following reasons:
>
>    1.
>
>    The RocksDBStateStore aims to solve the problem of the original
>    HDFSBasedStateStore, which is built-in.
>    2.
>
>    End users can conveniently set the config to use the new
>    implementation.
>    3.
>
>    We can set the RocksDB one as the default one in the future.
>
>
> For the consideration of the dependency, I also checked the rocksdbjni we
> might introduce. As a JNI package
> <https://repo1.maven.org/maven2/org/rocksdb/rocksdbjni/6.2.2/rocksdbjni-6.2.2.pom>,
> it should not have any dependency conflicts with Apache Spark.
>
> Any suggestions are welcome!
>
> Best,
>
> Yuanjian
>
> Reynold Xin <r...@databricks.com> 于2021年2月14日周日 上午6:54写道:
>
>> Late +1
>>
>>
>> On Sat, Feb 13 2021 at 2:49 PM, Liang-Chi Hsieh <vii...@gmail.com>
>> wrote:
>>
>>> Hi devs,
>>>
>>> Thanks for all the inputs. I think overall there are positive inputs in
>>> Spark community about having RocksDB state store as external module. Then
>>> let's go forward with this direction and to improve structured streaming. I
>>> will keep update to the JIRA SPARK-34198.
>>>
>>> Thanks all again for the inputs and discussion.
>>>
>>> Liang-Chi Hsieh
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> --------------------------------------------------------------------- To
>>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>

Reply via email to