Re: BinaryObject pros/cons

Igor Sapego Mon, 31 Oct 2016 11:18:20 -0700

Valentin,

-1 was just an example. I've checked - currently we use all possible range
of offset values.
So if we are going to use suggested approach then we need to reserve some
value and
adjust serialization/deserialization algorithms.


Best Regards,
Igor

On Mon, Oct 31, 2016 at 8:46 PM, Valentin Kulichenko <
[email protected]> wrote:

> Makes sense to me, but not sure about -1 in particular. Is this offset
> relative to object start position? What values can it have?
>
> -Val
>
> On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <[email protected]>
> wrote:
>
>> Vladimir,
>>
>> How about some reserved value? I.e -1 offset means a default/null value
>> should be used?
>>
>> Best Regards,
>> Igor
>>
>> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <[email protected]>
>> wrote:
>>
>>> Valya,
>>>
>>> Do you have any ideas how to implement this? We write field offsets in
>>> the
>>> footer. If field is not written, then what should be used for its offset?
>>>
>>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
>>> [email protected]> wrote:
>>>
>>> > Vladimir,
>>> >
>>> > These are good points, but I'm not suggesting to change the schema. If
>>> one
>>> > writes five fields, the schema should have five fields in any case,
>>> > regardless of values. I only suggest to change the internal
>>> representation
>>> > of the object and do not save fields with default values in the byte
>>> array
>>> > as we don't really need them there.
>>> >
>>> > -Val
>>> >
>>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <
>>> [email protected]>
>>> > wrote:
>>> >
>>> >> Valya,
>>> >>
>>> >> I have several concerns:
>>> >> 1) Correctness: hasField() will not work properly. But probably we can
>>> >> fix that by adding this info to schema.
>>> >> 2) Performance: we have lots optimizations which depend on either
>>> >> "stable" object schema, or low number of schemas. We will effectively
>>> turn
>>> >> them off.
>>> >> But what concerns me even more, is that we may end up in enormous
>>> number
>>> >> of schemas. E.g. consider an object with 10 number fields. If all
>>> fields
>>> >> could be zero, we may end up in something like 2^10 schemas.
>>> >>
>>> >> Vladimir.
>>> >>
>>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" <
>>> >> [email protected]> написал:
>>> >>
>>> >> Vova,
>>> >>>
>>> >>> Why do we need to write zeros and nulls in the first place? What's
>>> the
>>> >>> value of having them in the byte array?
>>> >>>
>>> >>> -Val
>>> >>>
>>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <
>>> [email protected]>
>>> >>> wrote:
>>> >>>
>>> >>>> Valya,
>>> >>>>
>>> >>>> Currently null value is written as one byte, while zero value of
>>> long
>>> >>>> type is written as 9 bytes. I want to improve that and write zeros
>>> as one
>>> >>>> byte as well.
>>> >>>>
>>> >>>> As per var-length encoding, I am strongly against it. It saves IO
>>> and
>>> >>>> memory at the cost of CPU. If we encode numbers in this way we will
>>> >>>> slowdown SQL (which is already not very fast, to be honest). Because
>>> >>>> instead of a single read memory read, we will have to perform
>>> multiple
>>> >>>> reads and then apply some mechanics to restore original value. We
>>> already
>>> >>>> have such problem with Strings - Java stores them as UTF-16, but we
>>> encode
>>> >>>> them as UTF-8. As a result every read of a string field in SQL
>>> results in
>>> >>>> decoding overhead.
>>> >>>>
>>> >>>> Vladimir.
>>> >>>>
>>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko <
>>> >>>> [email protected]> wrote:
>>> >>>>
>>> >>>>> Cross-posting this to dev list.
>>> >>>>>
>>> >>>>> Vladimir,
>>> >>>>>
>>> >>>>> To be honest, I don't see much difference between null values for
>>> >>>>> objects and zero values for primitives. From BinaryObject semantics
>>> >>>>> standpoint, both are default values for corresponding types. These
>>> values
>>> >>>>> will be returned from the BinaryObject.field() method regardless
>>> of whether
>>> >>>>> we actually save then in the byte array or not. Having said that,
>>> why don't
>>> >>>>> we just skip them during write?
>>> >>>>>
>>> >>>>> You optimization will be still useful though, because there are
>>> often
>>> >>>>> a lot of ints and longs that are not zeros, but still small and
>>> can fit 1-2
>>> >>>>> bytes. We already added such compaction in direct message
>>> marshaling and it
>>> >>>>> reduced overall traffic by around 30%.
>>> >>>>>
>>> >>>>> -Val
>>> >>>>>
>>> >>>>>
>>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov <
>>> [email protected]
>>> >>>>> > wrote:
>>> >>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> I am not very concerned with null fields overhead, because
>>> usually it
>>> >>>>>> won't be significant. However, there is a problem with zeros.
>>> User object
>>> >>>>>> might have lots of int/long zeros, this is not uncommon. And each
>>> zero will
>>> >>>>>> consume 4-8 additional bytes. We probably will implement special
>>> >>>>>> optimization which will write such fields in special compact
>>> format.
>>> >>>>>>
>>> >>>>>> Vladimir.
>>> >>>>>>
>>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko <
>>> >>>>>> [email protected]> wrote:
>>> >>>>>>
>>> >>>>>>> Hi,
>>> >>>>>>>
>>> >>>>>>> Yes, null values consume memory. I believe this can be optimized,
>>> >>>>>>> but I
>>> >>>>>>> haven't seen issues with this so far. Unless you have hundreds of
>>> >>>>>>> fields
>>> >>>>>>> most of which are nulls (very rare case), the overhead is
>>> minimal.
>>> >>>>>>>
>>> >>>>>>> -Val
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> View this message in context: http://apache-ignite-users.705
>>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
>>> >>>>>>> Sent from the Apache Ignite Users mailing list archive at
>>> Nabble.com.
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >
>>>
>>
>>
>

Re: BinaryObject pros/cons

Reply via email to