[
https://issues.apache.org/jira/browse/KAFKA-17792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889545#comment-17889545
]
Martin Sillence edited comment on KAFKA-17792 at 10/15/24 8:32 AM:
-------------------------------------------------------------------
I feel there are a few options
* make the schema expicit
* an exclude list
* a limit on the number of digits
The latter is the least intrusive but possibly the worst in terms of suprises
but to quantifiy it:
positive exponents:
{noformat}
1e+1 time 0.0 totalMemory 532676608
1e+10 time 0.0 totalMemory 532676608
1e+100 time 0.0 totalMemory 532676608
1e+1000 time 0.0 totalMemory 532676608
1e+10000 time 0.005 totalMemory 532676608
1e+100000 time 0.035 totalMemory 532676608
1e+1000000 time 0.228 totalMemory 532676608
1e+10000000 time 4.308 totalMemory 926941184
1e+100000000 time 117.119 totalMemory 3221225472
1e+1000000000 time 0.0 totalMemory 3221225472 BigInteger would overflow
supported range
1e+10000000000 time 0.001 totalMemory 3221225472 Too many nonzero exponent
digits.
1e+100000000000 time 0.0 totalMemory 3221225472 Too many nonzero exponent
digits.
1e+1000000000000 time 0.0 totalMemory 3221225472 Too many nonzero exponent
digits.
1e+10000000000000 time 0.0 totalMemory 3221225472 Too many nonzero exponent
digits.
{noformat}
negative exponents:
{noformat}
1e-1 time 0.001 totalMemory 532676608
1e-10 time 0.0 totalMemory 532676608
1e-100 time 0.001 totalMemory 532676608
1e-1000 time 0.0 totalMemory 532676608
1e-10000 time 0.005 totalMemory 532676608
1e-100000 time 0.034 totalMemory 532676608
1e-1000000 time 0.242 totalMemory 532676608
1e-10000000 time 4.342 totalMemory 926941184
1e-100000000 time 121.199 totalMemory 3368026112
1e-1000000000 time 0.0 totalMemory 3368026112 BigInteger would overflow
supported range
1e-10000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent digits.
1e-100000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent
digits.
1e-1000000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent
digits.
1e-10000000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent
digits.
{noformat}
so 1e+1000000 and 1e-1000000 seem to be where things start to get more
expensive (negative numbers are the same, memory is
{color:#d8d8d8}{color:#d8d8d8}
{color}{color:#d25252}Runtime{color}{color:#d8d8d8}.{color}{color:#bed6ff}getRuntime{color}{color:#d8d8d8}().{color}{color:#ffffff}totalMemory{color}{color:#d8d8d8}(){color}{color})
we have two choices then - either not process them as exact or leave them as
strings
leaving them as strings seems likely to break things unexpectedly but quickly
not being exact sounds like it would lead to subtle errors - for us we really
don't want our header to be rounded it's not a number
was (Author: msillence):
I feel there are a few options
* make the schema expicit
* an exclude list
* a limit on the number of digits
The latter is the least intrusive but possibly the worst in terms of suprises
but to quantifiy it:
positive exponents:
{noformat}
1e+1 time 0.0 totalMemory 532676608
1e+10 time 0.0 totalMemory 532676608
1e+100 time 0.0 totalMemory 532676608
1e+1000 time 0.0 totalMemory 532676608
1e+10000 time 0.005 totalMemory 532676608
1e+100000 time 0.035 totalMemory 532676608
1e+1000000 time 0.228 totalMemory 532676608
1e+10000000 time 4.308 totalMemory 926941184
1e+100000000 time 117.119 totalMemory 3221225472
1e+1000000000 time 0.0 totalMemory 3221225472 BigInteger would overflow
supported range
1e+10000000000 time 0.001 totalMemory 3221225472 Too many nonzero exponent
digits.
1e+100000000000 time 0.0 totalMemory 3221225472 Too many nonzero exponent
digits.
1e+1000000000000 time 0.0 totalMemory 3221225472 Too many nonzero exponent
digits.
1e+10000000000000 time 0.0 totalMemory 3221225472 Too many nonzero exponent
digits.
{noformat}
negative exponents:
{noformat}
1e-1 time 0.001 totalMemory 532676608
1e-10 time 0.0 totalMemory 532676608
1e-100 time 0.001 totalMemory 532676608
1e-1000 time 0.0 totalMemory 532676608
1e-10000 time 0.005 totalMemory 532676608
1e-100000 time 0.034 totalMemory 532676608
1e-1000000 time 0.242 totalMemory 532676608
1e-10000000 time 4.342 totalMemory 926941184
1e-100000000 time 121.199 totalMemory 3368026112
1e-1000000000 time 0.0 totalMemory 3368026112 BigInteger would overflow
supported range
1e-10000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent digits.
1e-100000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent
digits.
1e-1000000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent
digits.
1e-10000000000000 time 0.0 totalMemory 3368026112 Too many nonzero exponent
digits.
{noformat}
so 1e+1000000 and 1e-1000000 seem to be where things start to get more expensive
we have two choices then - either not process them as exact or leave them as
strings
leaving them as strings seems likely to break things unexpectedly but quickly
not being exact sounds like it would lead to subtle errors - for us we really
don't want our header to be rounded it's not a number
> header parsing ends up timing out and using large quantities of memory if the
> string looks like a number
> --------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-17792
> URL: https://issues.apache.org/jira/browse/KAFKA-17792
> Project: Kafka
> Issue Type: Bug
> Components: connect
> Reporter: Martin Sillence
> Priority: Major
>
> {color:#172b4d}We have trace headers such as:{color}
> {color:#172b4d}"X-B3-SpanId": "74320e6e26adc8f8"{color}
> {color:#172b4d}if however the value happens to be: "407127e212797209"{color}
> {color:#172b4d}This is then treated as a numeric value and it tries to
> convert this as a numeric representation and an exact value using
> BigDecimal{color}
> we end up with the trace:
> BigDecimal.setScale(int, RoundingMode) line: 2876
> Values$ValueParser.parseAsExactDecimal(BigDecimal) line: 1044
> Values$ValueParser.parseAsNumber(String) line: 1025
> Values$ValueParser.parseNextToken(boolean, String) line: 892
> Values$ValueParser.parse(boolean) line: 875
> Values.parseString(String) line: 415
> SimpleHeaderConverter.toConnectHeader(String, String, byte[]) line: 68
> WorkerSinkTask.convertHeadersFor(ConsumerRecord<byte[],byte[]>) line: 578
>
> this takes a long time to convert to an exact representation of a 212 billion
> digit integer
--
This message was sent by Atlassian Jira
(v8.20.10#820010)