AMC-team created HADOOP-19673:
---------------------------------

             Summary: BloomMapFile: invalid io.mapfile.bloom.error.rate (≤0 or 
≥1) causes NaN/zero vector size and writer construction failure
                 Key: HADOOP-19673
                 URL: https://issues.apache.org/jira/browse/HADOOP-19673
             Project: Hadoop Common
          Issue Type: Bug
          Components: common, io
    Affects Versions: 2.8.5
            Reporter: AMC-team


{{BloomMapFile.Writer#initBloomFilter(Configuration)}} computes the Bloom 
filter vector size as:

 
{code:java}
int numKeys = conf.getInt(IO_MAPFILE_BLOOM_SIZE_KEY, 
IO_MAPFILE_BLOOM_SIZE_DEFAULT);
float errorRate = conf.getFloat(IO_MAPFILE_BLOOM_ERROR_RATE_KEY, 
IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);// vectorSize = ceil( -k * n / ln(1 - 
p^(1/k)) )
int vectorSize = (int) Math.ceil(
  (double)(-HASH_COUNT * numKeys) /
  Math.log(1.0 - Math.pow(errorRate, 1.0 / HASH_COUNT))
); {code}
When {{io.mapfile.bloom.error.rate}} is *≤ 0* or {*}≥ 1{*}:
 * {{Math.pow(errorRate, 1/k)}} produces *NaN* (negative base with non-integer 
exponent) or an invalid value;
 * {{Math.log(1 - NaN)}} becomes {*}NaN{*};
 * {{Math.ceil(NaN)}} cast to {{int}} yields {*}0{*}, so {{{}vectorSize == 
0{}}};
 * constructing {{DynamicBloomFilter}} subsequently fails, and 
{{BloomMapFile.Writer}} construction fails (observed as assertion failure in 
tests).

The code misses input validation for {{io.mapfile.bloom.error.rate}} which 
should be strictly within {{{}(0, 1){}}}. With invalid values, the math 
silently degrades to NaN/0 and fails at runtime.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to