[GitHub] [druid] clintropolis commented on pull request #11888: add 'TypeStrategy' to types

GitBox Mon, 29 Nov 2021 13:34:53 -0800


clintropolis commented on pull request #11888:
URL: https://github.com/apache/druid/pull/11888#issuecomment-982044151



   > 1) estimateSizeBytesNullable appears to be the only reference to 
nullability. The other methods say they deal with non-null data. That seems 
inconsistent. Here I think I'm in favor of what @cheddar is suggesting, that 
this interface should not be null-aware, and null handling should be left to 
the caller. (Callers shouldn't need to use this interface if they get a null.)
   
   Oops, this was a leftover from the first round of changes from @cheddar 's 
comment where I moved most of the null byte handling stuff out of 
`TypeStrategy` itself, but forgot to move this one out. Since then, I've moved 
all of the null byte handling into the special `NullableTypeStrategy` that 
should be used for callers that want to use the null byte from `NullHandling`, 
along with lots of javadocs to encourage other options if available since the 
byte per value is quite expensive.
   
   > 2) To simplify life for callers that don't want to care about null 
handling, we could make a NullableTypeStrategy that accepts nulls and stores a 
null byte. It'd be composable with all the other TypeStrategy implementations, 
and lets certain callers stay simple while also letting other callers be fancy.
   
   Did this, as well as adding a `getNullableStrategy` to `TypeSignature` that 
will automatically wrap the output of `getStrategy` with the 
`NullableTypeStrategy`
   
   > 3) I'm confused about the intent of estimateSizeBytes: is it meant to be a 
maximum size? Like, can the estimate be an underestimate? Some of the javadocs 
and comments seem to suggest that is the case. If it's meant to be a max, we 
should call it specifically "max" instead of "estimate".
   
   Before the most recent set of changes, estimate was _actually_ being used to 
get the exact size of some value in bytes, in order to check that variably 
sized values did not exceed this maximum. This has been reworked as suggested 
in 4) to instead pass `maxSizeBytes` directly into the `write` family of 
methods, and now has been relaxed to only needing to be an estimate. The only 
thing still using it in the current PR is the heap based 
`ExpressionLambdaAggregator`, which uses it to check that the aggregator does 
not explode the heap in an unbounded way.
   
   > 4) I'm not sure if anything could use it today, but, we should design this 
interface to support variable size serialization lengths, since we'll want it 
for various future data types. Those could work by having write take a max 
length, and returning -1 if it would need more than that. Then the caller could 
allocate more space and call write again. Or, maybe, write could return 
-the_amount_of_space_it_needs. In this case the "estimate" method really could 
return an estimate rather than a max.
   
   Variably sized serialization was previously already supported (but required 
externally verifying that there was enough space with the estimate method). 
But, I liked the ideas here, so I have modified the `write` methods to now 
accept a maxSizeBytes parameter and went with 
`-the_amount_of_extra_space_it_needs` approach if this value (or buffer size) 
is exceeded, so that callers could choose to explode if the value was not able 
to be completely written, as well as know how much additional space would be 
required to fully write the value.
   
   > 5) Even with variable size support like (4) suggest, we will still have 
callers that can't deal with that. So there will need to be some API that 
supports those callers. I'm not sure what that'd look like; maybe a getMaxBytes 
method that returns the max, or -1 if there really isn't a known max? And if 
the max is -1, then callers that only support constant size serialization can't 
use the API?
   
   Variably sized support is in place, I'm not sure we need a `getMaxBytes` 
method as attempting to call write now requires specifying a max bytes, even 
for constant sized values, callers with constant sized serialization can either 
specify the correct size or some size larger than the actual size, which is 
sort of ugly, but also primitive numerics shouldn't really be using 
`TypeStrategy` directly, instead they should be using the static methods to 
avoid converting the java primitives, so maybe it isn't a huge deal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on pull request #11888: add 'TypeStrategy' to types

Reply via email to