[jira] [Updated] (IGNITE-11928) [IEP-19] keep data of same primary key on same node

Ewan (JIRA) Sat, 15 Jun 2019 20:37:19 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-11928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ewan updated IGNITE-11928:
--------------------------
    Description: 
I searched a lot and found very few documents talking about how does Ignite 
index and how it use indices. What I hope Ignite to have is something like the 
partition key in Cassandra, which is used by the database engine to find out 
which node in the cluster contains the row(s) of a table. Since the partition 
key determines the node where the rows with the same partition key stores, it 
will dramatically reduce the interaction between Ignite notes for a query, 
especially when the query is within the data of a specified partition key.

e.g. Table: profit of the companies by days, it has three columns:
 day: primary key / partition key
 company id: secondary key / column key
 profit: value of the row

Query:
 select * from table where day = '2019-01-01';

In Cassandra, the query will only touch one node to fetch all the data. The 
partition key has a prerequisite that the users or the programmers will ensure 
that all the data with same partition key will be small enough to be stored on 
single node of the cluster.

But in Ignite, for the same query, it may touch the whole cluster since each 
node in the cluster only stores a portion of the data with that primary key. 
This limits the performance of Ignite to be linearly increased when more nodes 
added to the cluster. I personally think the linear performance improvement is 
one of the key features a distributed database should have.

I would like to recommend that Ignite adds an option/setting for users to 
determine if all rows of same primary key to be stored on the same node. Thanks.

 

Here is another query example:
Table: profit of companies by date:
company_id: primary key / partition key
timestamp: secondary key / column key
profit: value of the row

Query:
select * from table where company_id=1001 and timestamp > 1000000 and timestamp 
< 5000000;

If all data of the company with id of 1001 is stored on one node, then the 
query will be returned without touching other nodes in the cluster.

 

  was:
I searched a lot and found very few documents talking about how does Ignite 
index and how it use indices. What I hope Ignite to have is something like the 
partition key in Cassandra, which is used by the database engine to find out 
which node in the cluster contains the row(s) of a table. Since the partition 
key determines the node where the rows with the same partition key stores, it 
will dramatically reduce the interaction between Ignite notes for a query, 
especially when the query is within the data of a specified partition key.

e.g. Table: profit of the companies by days, it has three columns:
 day: primary key / partition key
 company id: secondary key / column key
 profit: value of the row

Query:
 select * from table where day = '2019-01-01';

In Cassandra, the query will only touch one node to fetch all the data. The 
partition key has a prerequisite that the users or the programmers will ensure 
that all the data with same partition key will be small enough to be stored on 
single node of the cluster.

But in Ignite, for the same query, it may touch the whole cluster since each 
node in the cluster only stores a portion of the data with that primary key. 
This limits the performance of Ignite to be linearly increased when more nodes 
added to the cluster. I personally think the linear performance improvement is 
one of the key features a distributed database should have.

I would like to recommend that Ignite adds an option/setting for users to 
determine if all rows of same primary key to be stored on the same node. Thanks.

 


> [IEP-19] keep data of same primary key on same node
> ---------------------------------------------------
>
>                 Key: IGNITE-11928
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11928
>             Project: Ignite
>          Issue Type: Improvement
>          Components: data structures, persistence
>    Affects Versions: None
>            Reporter: Ewan
>            Priority: Minor
>              Labels: [IEP-19]
>             Fix For: None
>
>
> I searched a lot and found very few documents talking about how does Ignite 
> index and how it use indices. What I hope Ignite to have is something like 
> the partition key in Cassandra, which is used by the database engine to find 
> out which node in the cluster contains the row(s) of a table. Since the 
> partition key determines the node where the rows with the same partition key 
> stores, it will dramatically reduce the interaction between Ignite notes for 
> a query, especially when the query is within the data of a specified 
> partition key.
> e.g. Table: profit of the companies by days, it has three columns:
>  day: primary key / partition key
>  company id: secondary key / column key
>  profit: value of the row
> Query:
>  select * from table where day = '2019-01-01';
> In Cassandra, the query will only touch one node to fetch all the data. The 
> partition key has a prerequisite that the users or the programmers will 
> ensure that all the data with same partition key will be small enough to be 
> stored on single node of the cluster.
> But in Ignite, for the same query, it may touch the whole cluster since each 
> node in the cluster only stores a portion of the data with that primary key. 
> This limits the performance of Ignite to be linearly increased when more 
> nodes added to the cluster. I personally think the linear performance 
> improvement is one of the key features a distributed database should have.
> I would like to recommend that Ignite adds an option/setting for users to 
> determine if all rows of same primary key to be stored on the same node. 
> Thanks.
>  
> Here is another query example:
> Table: profit of companies by date:
> company_id: primary key / partition key
> timestamp: secondary key / column key
> profit: value of the row
> Query:
> select * from table where company_id=1001 and timestamp > 1000000 and 
> timestamp < 5000000;
> If all data of the company with id of 1001 is stored on one node, then the 
> query will be returned without touching other nodes in the cluster.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (IGNITE-11928) [IEP-19] keep data of same primary key on same node

Reply via email to