So I had this thought - why not use arrays of primitive types to store UDF state instead of objects.
The background is that if one stores int [] intarray - then java uses 4 bytes for each additional element in the array (verified). Instead if one stores an array of objects that store an int - then there seems to be about 16-20 bytes of extra overhead per object (not sure precisely - this is what it seems on my limited experiments). So imagine that: - we maintained states for UDFs in primitive arrays (this is the UDFs responsibility) - we had a customized HashMap implementation that stored an index (int) as value for a key (keys are still objects - but values are just 4 byte ints) o looked at the jdk source - this seems straightforward - to update state - we give the index to the evaluator. The evaluator can then index into whatever arrays it maintains and do whatever it wants. If it allocates a new slot in the array - then it can return the allocated index back to the framework (to store against the key) this way - at least we can get rid of the object overhead from the value part of the hashmap. Somewhat hacky (getting Java to work like C) - but this can be made to work I think. There is the issue of managing the free slots on the array (which are created on a flush) - but I think we can overlap a free list on top of the primitive array (say every free slot stores the index of the next free slot. When slots are freed - we can chain the new free slots into the existing head of the free list). Thoughts?