?????? About making an iterable cluster version in enterprise

CloudWise-Luke Wed, 24 Nov 2021 17:36:06 -0800

+1&nbsp;maintain it in rel/0.12 branch



CloudWiseluke.miao


&nbsp;




------------------&nbsp;????????&nbsp;------------------
??????:                                                                         
                                               "dev"                            
                                                        <[email protected]&gt;;
????????:&nbsp;2021??11??25??(??????) ????9:12
??????:&nbsp;"[email protected]"<[email protected]&gt;;

????:&nbsp;Re:  About making an iterable cluster version in enterprise



+1?? and I suggest to&nbsp; maintain it in rel/0.12 branch for it's stable now.




Thanks!


Chao Wang
BONC ltd
[email protected]
On 11/25/2021 09:03??Houliang Qi<[email protected]&gt; wrote??
Hi??
We also have some similar situations in the test environment. We very much 
support creating a stable and available version based on the current version.


Thanks,
---------------------------------------
Houliang Qi
BONC, Ltd


On 11/24/2021 21:29??Xiangdong Huang<[email protected]&gt; wrote??
Hi,
I think it is fine to keep maintaining the current cluster module,
either on rel/0.12 and master branch.
Look forward to progress.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

??????
???????? ????????

?????? <[email protected]&gt; ??2021??11??24?????? ????5:38??????

Hi, all

After running cluster version with online stream about two weeks, we 
experienced two times of failures that cluster is no response and can't recover 
by restarting. And we didn't find an effective way to recover data from 
cluster. So we'd like to make testable cluster version in enterprise which 
should have the properties:


1.&nbsp; Write operation won??t be blocked frequently.
2.&nbsp; Query bugs are tolerant as it could be fixed and iterate quickly.
3.&nbsp; Most of issues could be resolve by restart nodes or cluster.
4.&nbsp; Exist a solution to solve the unrecoverable issue after lose small 
part of data.
5.&nbsp; Cluster restart could complete in a proper time.
6.&nbsp; System has monitor mechanism.
We??re planning improve from below aspects:


1.&nbsp; Meta data use too much memory

In our scenario, the measurement scale is large which would be around 1 billion 
but we have small data point ingestion (100K per second). We found the cluster 
node can??t afford the metadata storage as memory limitation(each nodes has 
256G memory).&nbsp; As the small data point request rate, the CPU load is only 
about 1% ~ 2%. For the scenario, we intended to import some 3rd party storage 
component like RocksDB to help manage schema meta data. Of course, this would 
be optional and can be configured.



1.&nbsp; Raft implementation

For this one, we planned to make it two steps. First, we??d like to abstract 
the interfaces of Raft, try to make Raft as a independent component. This 
should also be one work item when implement new architecture. Second, we??d 
like to import some 3rd party Raft library like Ratis and make it configurable 
ideally.



1.&nbsp; Engineering components

Cluster missed some components like monitor system(this one should be working 
in progress by community, we??d like to help if needed), migration single node 
data into cluster which would help migrate single node to cluster and tools to 
help do failure recovery. We need to make these tools to make the system 
observable and recoverable.



1.&nbsp; Test

As new test architecture is importing into community, we would try to 
complement test cases under new architecture.


Most of the solutions above are not investigate deeply, any idea is welcomed.

What??s the benefit of the work?
We intend to make the version run on production so that we can collect 
feedback/bugs from real user and iterate by that. And finally become a baseline 
of stable cluster version.

Why won??t make it in new architecture?
We don??t do this under new architecture because the new architecture just 
started planning and we can??t wait anymore. And nearly all of the work 
doesn??t conflict with new architecture and could be usable in new architecture.
Please feel free to reply the email to discussion if you have any concern or 
idea.

Welcome to discuss if you have any concern.

----------------------------------------------------------
Thanks!
Jianyun Cheng

?????? About making an iterable cluster version in enterprise

Reply via email to