Valentin, -1 was just an example. I've checked - currently we use all possible range of offset values. So if we are going to use suggested approach then we need to reserve some value and adjust serialization/deserialization algorithms.
Best Regards, Igor On Mon, Oct 31, 2016 at 8:46 PM, Valentin Kulichenko < [email protected]> wrote: > Makes sense to me, but not sure about -1 in particular. Is this offset > relative to object start position? What values can it have? > > -Val > > On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <[email protected]> > wrote: > >> Vladimir, >> >> How about some reserved value? I.e -1 offset means a default/null value >> should be used? >> >> Best Regards, >> Igor >> >> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <[email protected]> >> wrote: >> >>> Valya, >>> >>> Do you have any ideas how to implement this? We write field offsets in >>> the >>> footer. If field is not written, then what should be used for its offset? >>> >>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko < >>> [email protected]> wrote: >>> >>> > Vladimir, >>> > >>> > These are good points, but I'm not suggesting to change the schema. If >>> one >>> > writes five fields, the schema should have five fields in any case, >>> > regardless of values. I only suggest to change the internal >>> representation >>> > of the object and do not save fields with default values in the byte >>> array >>> > as we don't really need them there. >>> > >>> > -Val >>> > >>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov < >>> [email protected]> >>> > wrote: >>> > >>> >> Valya, >>> >> >>> >> I have several concerns: >>> >> 1) Correctness: hasField() will not work properly. But probably we can >>> >> fix that by adding this info to schema. >>> >> 2) Performance: we have lots optimizations which depend on either >>> >> "stable" object schema, or low number of schemas. We will effectively >>> turn >>> >> them off. >>> >> But what concerns me even more, is that we may end up in enormous >>> number >>> >> of schemas. E.g. consider an object with 10 number fields. If all >>> fields >>> >> could be zero, we may end up in something like 2^10 schemas. >>> >> >>> >> Vladimir. >>> >> >>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" < >>> >> [email protected]> написал: >>> >> >>> >> Vova, >>> >>> >>> >>> Why do we need to write zeros and nulls in the first place? What's >>> the >>> >>> value of having them in the byte array? >>> >>> >>> >>> -Val >>> >>> >>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov < >>> [email protected]> >>> >>> wrote: >>> >>> >>> >>>> Valya, >>> >>>> >>> >>>> Currently null value is written as one byte, while zero value of >>> long >>> >>>> type is written as 9 bytes. I want to improve that and write zeros >>> as one >>> >>>> byte as well. >>> >>>> >>> >>>> As per var-length encoding, I am strongly against it. It saves IO >>> and >>> >>>> memory at the cost of CPU. If we encode numbers in this way we will >>> >>>> slowdown SQL (which is already not very fast, to be honest). Because >>> >>>> instead of a single read memory read, we will have to perform >>> multiple >>> >>>> reads and then apply some mechanics to restore original value. We >>> already >>> >>>> have such problem with Strings - Java stores them as UTF-16, but we >>> encode >>> >>>> them as UTF-8. As a result every read of a string field in SQL >>> results in >>> >>>> decoding overhead. >>> >>>> >>> >>>> Vladimir. >>> >>>> >>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko < >>> >>>> [email protected]> wrote: >>> >>>> >>> >>>>> Cross-posting this to dev list. >>> >>>>> >>> >>>>> Vladimir, >>> >>>>> >>> >>>>> To be honest, I don't see much difference between null values for >>> >>>>> objects and zero values for primitives. From BinaryObject semantics >>> >>>>> standpoint, both are default values for corresponding types. These >>> values >>> >>>>> will be returned from the BinaryObject.field() method regardless >>> of whether >>> >>>>> we actually save then in the byte array or not. Having said that, >>> why don't >>> >>>>> we just skip them during write? >>> >>>>> >>> >>>>> You optimization will be still useful though, because there are >>> often >>> >>>>> a lot of ints and longs that are not zeros, but still small and >>> can fit 1-2 >>> >>>>> bytes. We already added such compaction in direct message >>> marshaling and it >>> >>>>> reduced overall traffic by around 30%. >>> >>>>> >>> >>>>> -Val >>> >>>>> >>> >>>>> >>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov < >>> [email protected] >>> >>>>> > wrote: >>> >>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> I am not very concerned with null fields overhead, because >>> usually it >>> >>>>>> won't be significant. However, there is a problem with zeros. >>> User object >>> >>>>>> might have lots of int/long zeros, this is not uncommon. And each >>> zero will >>> >>>>>> consume 4-8 additional bytes. We probably will implement special >>> >>>>>> optimization which will write such fields in special compact >>> format. >>> >>>>>> >>> >>>>>> Vladimir. >>> >>>>>> >>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko < >>> >>>>>> [email protected]> wrote: >>> >>>>>> >>> >>>>>>> Hi, >>> >>>>>>> >>> >>>>>>> Yes, null values consume memory. I believe this can be optimized, >>> >>>>>>> but I >>> >>>>>>> haven't seen issues with this so far. Unless you have hundreds of >>> >>>>>>> fields >>> >>>>>>> most of which are nulls (very rare case), the overhead is >>> minimal. >>> >>>>>>> >>> >>>>>>> -Val >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> View this message in context: http://apache-ignite-users.705 >>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html >>> >>>>>>> Sent from the Apache Ignite Users mailing list archive at >>> Nabble.com. >>> >>>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >>> >>> > >>> >> >> >
