Paul Rogers created IMPALA-8145:
-----------------------------------

             Summary: Partition metadata key muddle
                 Key: IMPALA-8145
                 URL: https://issues.apache.org/jira/browse/IMPALA-8145
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers


Impala stores metadata, including about HDFS partitions. Partitions are defined 
as a collection of keys which are in terms of {{(column, value)}} pairs. For 
example, "year=2018/month=1". The columns are defined in HMS with a name and 
type. Values are defined as part of the partition definition.

Impala performs partition pruning. This means that a query that says {{WHERE 
month=2}} will omit the above partition, but will scan one for 
"year=2018/month=2". To perform the pruning, the value of the partition key 
must be converted from text (as used to define the directory) to the same type 
as a column, say TINYINT here.

Conversion is done in the catalog server when loading a partition. Given the 
type of the column, the catalog parses the string value of the key, in this 
case into a NumericLiteral of type TINYINT. The resulting object is then 
converted into a Thrift TExpr node, sent over the network to the Coordinator, 
where it is deserialized back into a NumericLiteral.

All of this works fine for String and integer keys. It fails, however, for 
float and double keys. (Let's set aside the fact that partitioning on floating 
point numbers is a very bad idea for a number of reasons. Impala supports this 
bad choice. Our job here is just to deal with that decision.)

NumericLiteral stores its value as a Java BigDecimal. BigDecimal stores values 
in decimal and so can easily represent, say 0.1, if that is the partition key. 
Unfortunately, floating point numbers are binary, and cannot accurately 
represent anything other than a sum of binary fractions. The value 0.1 is a 
repeating fraction in binary.

Because of magic I don't fully understand, storing 0.1 as double will render 
0.1 when printed. Presumably the floating point standard handles this in some 
way.

But, when the process above occurs, upon deserialization from Thrift, the 
double value is converted to a BigDecimal. The result is the value 
1.100000000000000088817841970012523233890533447265625. That is, BigDecimal is 
more precise than double, and can represent (in decimal) the sum of binary 
fractions used to approximate 0.1

This issue is fully described in the [BigDecimal 
javadoc|https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#BigDecimal-double-],
 using, as it happens, the very value of 0.1 discussed above.

The result is that, if a query is planned in local catalog mode, partition 
pruning for "WHERE float_col=0.1" works, because the parser parses "0.1" 
directly from string to BigDecimal, then onto decimal.

But, if the same query is planned in traditional model, the extra Thrift 
conversion cause the bogus value shown above to be used in comparisons, 
resulting in a failed partition match.

The temporary solution is to convert from double in Thrift back to string, and 
from String to BigDecimal. This is, obviously, quite silly.

The bigger issue is that there is no good reason for the catalog server to 
parse partition keys into literal expressions only to be converted to Thrift. 
Better to leave the partition keys as strings and allow the coordinator to do 
any required parsing to literal expressions.

Note that, in the current design, with code before a recent revision, the 
catalog server must analyze each literal expression, but there is no 
coordinator to provide the analyzer, so special code was needed to allow 
analysis with a null analyzer, needlessly complicating the logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to