AMC-team created HADOOP-19673: --------------------------------- Summary: BloomMapFile: invalid io.mapfile.bloom.error.rate (≤0 or ≥1) causes NaN/zero vector size and writer construction failure Key: HADOOP-19673 URL: https://issues.apache.org/jira/browse/HADOOP-19673 Project: Hadoop Common Issue Type: Bug Components: common, io Affects Versions: 2.8.5 Reporter: AMC-team
{{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom filter vector size as: {code:java} int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY, IO_MAPFILE_BLOOM_SIZE_DEFAULT); float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY, IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);// vectorSize = ceil( -k * n / ln(1 - p^(1/k)) ) int vectorSize = (int) Math.ceil( (double)(-HASH_COUNT * numKeys) / Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT)) ); {code} When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}: * {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with non-integer exponent) or an invalid value; * {{Math.log(1 - NaN)}} becomes {*}NaN{*}; * {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize == 0{}}}; * constructing {{DynamicBloomFilter}} subsequently fails, and {{BloomMapFile.Writer}} construction fails (observed as assertion failure in tests). The code misses input validation for {{io.mapfile.bloom.error.rate}} which should be strictly within {{{}(0, 1){}}}. With invalid values, the math silently degrades to NaN/0 and fails at runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org