Cool… We already have a workaround for our issue… I just wanted to make sure we are not doing something terribly wrong.
Regards, Igor > On Oct 18, 2018, at 11:07 AM, Vladimir Ozerov <voze...@gridgain.com> wrote: > > Hi Igor, > > What Ignite version do you use? Recent versions do not read real rows for > the simplest "SELECT COUNT(*) FROM table" query. Instead it goes over index > and simply count records, so no copying happens. But let me express again - > this is not the case in general. If any expression appears, we have to read > the row from offheap. In this case the whole key-value pair is read from > offheap including non-SQL fields. This is so because we do not know in > advance what parts of key or value will be needed. Even if specific field > is not marked as @QuerySqlField, it may be needed if the whole key or value > is requested ("SELECT _KEY FROM table"), or if it is used in some custom > function, etc.. > > In future versions we may re-approach this, and try to read only fields > required for query from offheap. But this will require significant changes > to both storage layer and query engine, so this is not an easy task. > > > On Thu, Oct 18, 2018 at 2:42 AM Igor Tanackovic <igor.tanacko...@gmail.com> > wrote: > >> Dmitriy, >> >> Correct me if I’m wrong, but the concept is to store everything off heap - >> which is perfectly fine :). So, the question is how SQL engine actually >> works. Analyzing profiler object generation and sizes on heap, I’ve learned >> that for each query entire rows in question are copied on heap which, I’d >> say is a little bit inefficient (more or less depending on domain object). >> Maybe an example will help to further elaborate on the issue. >> >> Lets say we have an object ORGANIZATION which has only two query enabled >> fields (annotated with @QuerySqlField) and 20 more value fields >> (text/blobs, …) which are not used in queries (not annotated with >> @QuerySqlField and thus not visible by SQL engine). So, how COUNT works? >> Will SQL engine copy on heap just a part of ORGANIZATION rows (2 fields >> that might be part of COUNT query) or entire rows (sql enabled fields + >> text/blob fields…)? >> >> From what I saw during profiling, I’d say entire rows will be copied on >> heap. Just to compare, this is our, real world example: >> >> Object A (~300k rows, each row has 5 columns, all sql enabled), COUNT(*) >> takes ~2MB on heap >> Object B (~25k rows, each row has 16 columns, 5 sql enabled fields and 11 >> text value fields), COUNT(*) takes ~60MB on heap - nearly 30 times more >> >> >> Regards, >> Igor >> >> >> >>> On Oct 18, 2018, at 12:24 AM, Dmitriy Setrakyan <dsetrak...@apache.org> >> wrote: >>> >>> I do not understand - why are we copying values on-heap? If we must copy >>> something, why not copy rows into some off-heap space and read what we >> have >>> from there? At least this way we will not burden the JVM with extra GC >>> pressure. >>> >>> D. >>> >>> On Wed, Oct 17, 2018 at 10:33 AM Igor Tanackovic < >> igor.tanacko...@gmail.com> >>> wrote: >>> >>>> Vladimir, >>>> >>>> Thanks for explanation... that’s true - ignite never deserialize values >>>> during query execution. The point here is why copying fields to heap >> that >>>> sql engine could not use (not annotated as QuerySqlField)? >>>> SQL count is a perfect example where you can benefit (heap space and gc) >>>> copying only field sql engine could use. >>>> >>>> Regards, >>>> Igor >>>> >>>> On Wed, Oct 17, 2018 at 18:47 Vladimir Ozerov <voze...@gridgain.com> >>>> wrote: >>>> >>>>> Hi Igor, >>>>> >>>>> We never deserialize values during query execution. Instead, we copy >> the >>>>> row to heap and extract fields as needed. In general case it is >>>> impossible >>>>> to avoid reading the whole row because we do not know whether this is >>>>> COUNT(*) or COUNT(*) WHERE <something> or COUNT(<something>) WHERE >>>>> <something>. >>>>> We do handle plain COUNT(*) as speical case and iterate over index >> only, >>>>> but event this simple query cannot avoid row reads in general case when >>>> new >>>>> snapshot mode is enabled, because an entry in the index may be not >>>> visible >>>>> to current transaction. >>>>> >>>>> On Wed, Oct 17, 2018 at 7:19 PM igor.tanackovic < >>>> igor.tanacko...@gmail.com >>>>>> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Seems that SQL engine always deserialize whole objects instead of >> using >>>>>> just >>>>>> SQL enabled fields (annotated with @QuerySqlField). This may have a >>>> huge >>>>>> impact on Ignite heap usage and GC overhead as well. >>>>>> >>>>>> For example, we have a cache holding big objects but with only two sql >>>>>> query >>>>>> fields which for each query execution (SELECT COUNT(*) FROM 'cache') >>>>>> consumes large amount on heap memory (~300MB). As a proof of concept, >>>> we >>>>>> divided the same cache to *index* cache with only sql query field and >> a >>>>>> *data* holding whole object for materialization. The same query >> (SELECT >>>>>> COUNT(*) FROM 'index-cache') consumes ~25 time less memory! The same >> is >>>>>> true >>>>>> for all other queries. >>>>>> >>>>>> The obvious workaround would be to always have separated regions for >>>>>> indexes >>>>>> (sql query enabled region) and a data/value region for >> materialization, >>>>> but >>>>>> it might be a good idea to fix this in a systematic way during off >> heap >>>>>> deserialization. >>>>>> >>>>>> Regards, >>>>>> Igor >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ >>>>>> >>>>> >>>> >> >>