[jira] [Commented] (IGNITE-12543) When put List>, the data was increased much larger.

Ivan Pavlukhin (Jira) Mon, 09 Mar 2020 05:10:37 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-12543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054896#comment-17054896
 ]


Ivan Pavlukhin commented on IGNITE-12543:
-----------------------------------------

[~sunghan.suh], thank you for your efforts. I must say that the proposed fix 
cannot be applied as is. I run tests against PR and there are critical 
failures. Here are results 
[report|https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/7403/head&action=Latest].
 And most illustrative one is a failure of [Binary 
Objects|https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_BinaryObjects?branch=pull%2F7403%2Fhead&mode=builds]
 test suite.

And actually the reason was already mentioned in comments.
bq. Initially the protocol was designed to be universal and able to serialize 
objects of any complexity (nested objects, circular references).
Consider an example to make things clear.
{code:java}
// Node of a doubly linked list
class Node {
  Node next, prev;
  String val;
}
{code}
In a list of two elements we have 2 nodes having references to each other. Some 
trick is needed to serialize such structure. In _Binary Object_ format it is 
called _handles_. Let's call these 2 list elements A and B. In a serialized 
form we can start writing A:
{noformat}
[prev:null, val:A, next:?]
{noformat}
Let's write A.next = B directly:
{noformat}
[prev:null, val:A, next:[prev:?, val:B, next:null]]
{noformat}
What should be written as B.prev = A? writing it directly as before will lead 
to infinite recursion. So, a _handle_ (i.e. reference) is written. You can find 
more details about _Binary Object_ format in a following 
[article|https://cwiki.apache.org/confluence/display/IGNITE/Binary+object+format].

And returning to the issue we need to answer the question why more bytes than a 
specific object takes is written during serialization. If we need to serialize 
node B from the provided example we need to serialize node A as well. In 
current implementation we simply take all bytes from top-most binary object. In 
such fashion we can serialize and transfer objects of any structure safely.

Currently I do not have a simple solution in my mind how this issue could be 
fixed while keeping the ability to serialize any object.

> When put List<List<SomeObject>>, the data was increased much larger.
> --------------------------------------------------------------------
>
>                 Key: IGNITE-12543
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12543
>             Project: Ignite
>          Issue Type: Bug
>          Components: thin client
>    Affects Versions: 2.6
>            Reporter: LEE PYUNG BEOM
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I use Ignite 2.6 version of Java Thin Client.
>  
> When I put data in the form List<List<SomeObject>>, 
> The size of the original 200KB data was increased to 50MB when inquired by 
> Ignite servers.
> On the Heap Dump, the list element was repeatedly accumulated, increasing the 
> data size.
>  
> When I checked org.apacheignite.internal.binary.BinaryWriterExImpl.java 
> doWriteBinaryObject() method,
> {code:java}
> // org.apacheignite.internal.binary.BinaryWriterExImpl.java
>     public void doWriteBinaryObject(@Nullable BinaryObjectImpl po) {
>         if (po == null)
>             out.writeByte(GridBinaryMarshaller.NULL);
>         else {
>             byte[] poArr = po.array();
>             out.unsafeEnsure(1 + 4 + poArr.length +4);
>             out.unsafeWriteByte(GridBinaryMarshaller.BINARY_OBJ);
>             out.unsafeWriteInt(poArr.length);
>             out.writeByteArray(poArr);
>             out.unsafeWriteInt(po.start());
>         }
>     }
> {code}
>  
> The current Ignite implementation for storing data in the form 
> List<List<Some_Objectject>> is:
> In the Marshalling stage, for example, data the size of List(5 
> members)<List(10 members)<Some_Object(size:200 KB)> is:
> As many as 10*5 of the list's elements are duplicated.
> If the above data contains five objects of 200KB size, ten by one,
> 50 iterations are stored and 200K*10**5 = 100MB of data is used for cache and 
> transfer.
> As a result of this increase in data size, it is confirmed that the failure 
> of OOM, GC, etc. is caused by occupying Heap memory.
> Unnecessarily redundant data is used for cache storage and network transport.
> When looking up cache data, only some of the data at the top is read based on 
> file location information from the entire data, so that normal data is 
> retrieved.
> The way we're implemented today is safe from basic behavior, but we're 
> wasting memory and network unnecessarily using inefficient algorithms
> This can have very serious consequences. Please check.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IGNITE-12543) When put List>, the data was increased much larger.

Reply via email to