Igniters,

Since we're developing some kind of storage system it's pretty interesting
how effectively it stores data.

I propose to develop some Estimator allows to count how much space is
needed to keep any data.

For example:
1) You have classes A,B and C with known fields and data distribution over
this fields.
2) You know that you have to keep 1M of A, 2M of B and 45K of C.

We can perform estimation in two different approaches:

1) Estimate how much space is needed to keep data in binary format.
So, we should
- Create some instances
- Marshall them to binary format
- Count sum(sizes)
- Multiply

Pros:
- Fast.
- No need to start Ignite nodes.
- Can be used as some kind of benchmarking tool for BinaryMarshaller.
Once you improve something at BinaryMarshaller you'll see profit at
BinarySizeEstimator results.

Cons:
- Estimation result will be different from real cluster memory consumption
and can be used only as preliminary assessment.

2) Estimate how much space is needed to keep data in real cluster.
So, we should
- Configure and start small cluster. Set page size, cache types and amount,
backups, nodes count, etc.
- Create a lot of instances (1/1000, 1/10 or even 1/1 of expected)
- Count pages size

Pros:
- Can be used as pre-production tuning tool.

Cons:
- Slow.
- Required to start Ignite nodes and a lot of free memory.


I think we need both, but I propose to start with first approach -
BinarySizeEstimator (https://issues.apache.org/jira/browse/IGNITE-6300)

Thoughts?

Reply via email to