Thanks Jingsong for initiating this discussion, I'm big +1 for it. Building a query service based on Paimon is a very exciting feature, which not only simplifies the user's online data service architecture, but also serves as a dim service for the streaming process. In addition, we can continue to improve olap capability for Paimon based on this.
The POC PR [1] is overall nice to me, it builds LSM data storage locally and also sets up a data service in Flink which is a streaming job. QueryService is a big work, I think we can start with this PIP and the basic abilities in the PR [1]. After that, we can continue to improve data partitioning, service discovery, client&SDK, and more performance and stability relevant features. Looking forward to QueryService based on Paimon, thanks! [1] https://github.com/apache/incubator-paimon/pull/2110 On Wed, Oct 11, 2023 at 4:07 PM Ming Li <[email protected]> wrote: > > > > Yes, I think there must be a primary key, we can compute the bucket > > from the primary key, and find which executor to visit. > > This is the primary key Query Service. > > > > hi, Jingsong, thank you for providing this explanation. It looks good to > me for the first version only supports lookup based on primary keys, and > most of our scenarios also use lookups based on primary keys. > > But there are other problems, when the number of executors changes, the > > service needs to be restarted and the data needs to be loaded. > > > > hi, jufang, I think this is not a big problem. For the Query Service, the > first version may be embedded in flink job, and high availability depends > on the implementation of Flink. > > Best, > Ming Li > > > jufang he <[email protected]> 于2023年10月10日周二 17:20写道: > > > Hi, Ming. > > > > As Xiangyu mentioned, we encountered the same problem when implementing a > > similar solution in ByteDance, maybe I can share some experience. > > > > The small table has more than 100g data, so it needs to be placed on > > separate nodes. To solve the problem of getting Executor addresses > during > > RPC queries, The same hash rules are used in data generation, loading and > > querying. When the data is generated, the data is written to different > > directories according to the hash algorithm. When data is loaded into > > executors, the same hash algorithm is used, and the number of executors > is > > set in advance, and the data is loaded into different executors. Since a > > single Executor can still exceed the memory limit, we put the data into a > > local KV store. When dealing with large tables, the Executor number of > the > > current key to be queried can be calculated according to the same hash > > algorithm and the number of executors set in advance. Based on the > Executor > > number we can get the Executor RPC address from ZK. > > > > > > But there are other problems, when the number of executors changes, the > > service needs to be restarted and the data needs to be loaded. > > > > Best, > > > > Jufang > > > > Jingsong Li <[email protected]> 于2023年10月10日周二 16:40写道: > > > > > Hi Ming. > > > > > > Yes, I think there must be a primary key, we can compute the bucket > > > from the primary key, and find which executor to visit. > > > This is the primary key Query Service. > > > > > > And then, maybe we can introduce more Query Service types, maybe > > > another service can be Secondary indexed Query Service, it can be > > > queried by another field to get primary key, (maybe use RocksDB to > > > maintain the index) and query primary key Query Service to get the > > > whole value. > > > > > > The Secondary indexed Query Service and Primary Key Query Service are > > > independent and unrelated, but then, we can use Snapshot Id to do some > > > consistent alignment work. But this should be more complicated. > > > > > > These things can be imaged, but need lots of work. > > > > > > I just created a POC for first version, it is very rough: > > > https://github.com/apache/incubator-paimon/pull/2110 > > > > > > Best, > > > Jingsong > > > > > > On Tue, Oct 10, 2023 at 3:36 PM Ming Li <[email protected]> > wrote: > > > > > > > > Thanks for the proposal! > > > > It is a common scenario for multiple applications to share the same > > > > dimension table. As described in the design document, the TableQuery > > > client > > > > will obtain the addresses of all Executors from the AddressServer and > > > then > > > > request them through RPC. I have a question about this: How does the > > > > TableQuery client decide which Executor to request? Request all > > > Executors > > > > in turn? Or is it restricted that the key of lookup must contain > > > bucket-key? > > > > > > > > Best, > > > > Ming Li > > > > > > > > > > > > Jingsong Li <[email protected]> 于2023年10月8日周日 18:35写道: > > > > > > > > > Hi all, > > > > > > > > > > I want to bring up a discussion about Paimon QueryService [1]. > > > > > > > > > > Paimon primary key table already provides LSM file structure, it > is a > > > > > pity that the paimon can not provide a queryable service for > lookup. > > > > > > > > > > A distributed service can download Paimon files locally and > provide a > > > > > Lookup service. It does not affect the write process and read > > process, > > > > > it is a separate server. It can be used as: > > > > > > > > > > 1. Flink Lookup Join, reuse by multiple Flink Jobs. > > > > > 2. Online Service Lookup, this requires high stability. (it may not > > be > > > > > so stable in the first version) > > > > > > > > > > See more in PIP [1]. > > > > > > > > > > This PIP is a high-level design for Paimon QueryService, not > > including > > > > > all details. > > > > > > > > > > [1] > > > > > > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-10%3A+Introduce+Paimon+QueryService > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > > > > >
