[
https://issues.apache.org/jira/browse/IGNITE-12543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054896#comment-17054896
]
Ivan Pavlukhin commented on IGNITE-12543:
-----------------------------------------
[~sunghan.suh], thank you for your efforts. I must say that the proposed fix
cannot be applied as is. I run tests against PR and there are critical
failures. Here are results
[report|https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/7403/head&action=Latest].
And most illustrative one is a failure of [Binary
Objects|https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_BinaryObjects?branch=pull%2F7403%2Fhead&mode=builds]
test suite.
And actually the reason was already mentioned in comments.
bq. Initially the protocol was designed to be universal and able to serialize
objects of any complexity (nested objects, circular references).
Consider an example to make things clear.
{code:java}
// Node of a doubly linked list
class Node {
Node next, prev;
String val;
}
{code}
In a list of two elements we have 2 nodes having references to each other. Some
trick is needed to serialize such structure. In _Binary Object_ format it is
called _handles_. Let's call these 2 list elements A and B. In a serialized
form we can start writing A:
{noformat}
[prev:null, val:A, next:?]
{noformat}
Let's write A.next = B directly:
{noformat}
[prev:null, val:A, next:[prev:?, val:B, next:null]]
{noformat}
What should be written as B.prev = A? writing it directly as before will lead
to infinite recursion. So, a _handle_ (i.e. reference) is written. You can find
more details about _Binary Object_ format in a following
[article|https://cwiki.apache.org/confluence/display/IGNITE/Binary+object+format].
And returning to the issue we need to answer the question why more bytes than a
specific object takes is written during serialization. If we need to serialize
node B from the provided example we need to serialize node A as well. In
current implementation we simply take all bytes from top-most binary object. In
such fashion we can serialize and transfer objects of any structure safely.
Currently I do not have a simple solution in my mind how this issue could be
fixed while keeping the ability to serialize any object.
> When put List<List<SomeObject>>, the data was increased much larger.
> --------------------------------------------------------------------
>
> Key: IGNITE-12543
> URL: https://issues.apache.org/jira/browse/IGNITE-12543
> Project: Ignite
> Issue Type: Bug
> Components: thin client
> Affects Versions: 2.6
> Reporter: LEE PYUNG BEOM
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I use Ignite 2.6 version of Java Thin Client.
>
> When I put data in the form List<List<SomeObject>>,
> The size of the original 200KB data was increased to 50MB when inquired by
> Ignite servers.
> On the Heap Dump, the list element was repeatedly accumulated, increasing the
> data size.
>
> When I checked org.apacheignite.internal.binary.BinaryWriterExImpl.java
> doWriteBinaryObject() method,
> {code:java}
> // org.apacheignite.internal.binary.BinaryWriterExImpl.java
> public void doWriteBinaryObject(@Nullable BinaryObjectImpl po) {
> if (po == null)
> out.writeByte(GridBinaryMarshaller.NULL);
> else {
> byte[] poArr = po.array();
> out.unsafeEnsure(1 + 4 + poArr.length +4);
> out.unsafeWriteByte(GridBinaryMarshaller.BINARY_OBJ);
> out.unsafeWriteInt(poArr.length);
> out.writeByteArray(poArr);
> out.unsafeWriteInt(po.start());
> }
> }
> {code}
>
> The current Ignite implementation for storing data in the form
> List<List<Some_Objectject>> is:
> In the Marshalling stage, for example, data the size of List(5
> members)<List(10 members)<Some_Object(size:200 KB)> is:
> As many as 10*5 of the list's elements are duplicated.
> If the above data contains five objects of 200KB size, ten by one,
> 50 iterations are stored and 200K*10**5 = 100MB of data is used for cache and
> transfer.
> As a result of this increase in data size, it is confirmed that the failure
> of OOM, GC, etc. is caused by occupying Heap memory.
> Unnecessarily redundant data is used for cache storage and network transport.
> When looking up cache data, only some of the data at the top is read based on
> file location information from the entire data, so that normal data is
> retrieved.
> The way we're implemented today is safe from basic behavior, but we're
> wasting memory and network unnecessarily using inefficient algorithms
> This can have very serious consequences. Please check.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)