[jira] [Created] (IMPALA-8145) Partition metadata key muddle

Paul Rogers (JIRA) Tue, 29 Jan 2019 20:56:22 -0800

Paul Rogers created IMPALA-8145:
-----------------------------------

             Summary: Partition metadata key muddle
                 Key: IMPALA-8145
                 URL: https://issues.apache.org/jira/browse/IMPALA-8145
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers

Impala stores metadata, including about HDFS partitions. Partitions are defined
as a collection of keys which are in terms of {{(column, value)}} pairs. For
example, "year=2018/month=1". The columns are defined in HMS with a name and
type. Values are defined as part of the partition definition.

Impala performs partition pruning. This means that a query that says {{WHERE
month=2}} will omit the above partition, but will scan one for
"year=2018/month=2". To perform the pruning, the value of the partition key
must be converted from text (as used to define the directory) to the same type
as a column, say TINYINT here.

Conversion is done in the catalog server when loading a partition. Given the
type of the column, the catalog parses the string value of the key, in this
case into a NumericLiteral of type TINYINT. The resulting object is then
converted into a Thrift TExpr node, sent over the network to the Coordinator,
where it is deserialized back into a NumericLiteral.

All of this works fine for String and integer keys. It fails, however, for
float and double keys. (Let's set aside the fact that partitioning on floating
point numbers is a very bad idea for a number of reasons. Impala supports this
bad choice. Our job here is just to deal with that decision.)

NumericLiteral stores its value as a Java BigDecimal. BigDecimal stores values
in decimal and so can easily represent, say 0.1, if that is the partition key.
Unfortunately, floating point numbers are binary, and cannot accurately
represent anything other than a sum of binary fractions. The value 0.1 is a
repeating fraction in binary.

Because of magic I don't fully understand, storing 0.1 as double will render
0.1 when printed. Presumably the floating point standard handles this in some
way.

But, when the process above occurs, upon deserialization from Thrift, the
double value is converted to a BigDecimal. The result is the value
1.100000000000000088817841970012523233890533447265625. That is, BigDecimal is
more precise than double, and can represent (in decimal) the sum of binary
fractions used to approximate 0.1

This issue is fully described in the [BigDecimal
javadoc|https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#BigDecimal-double-],
using, as it happens, the very value of 0.1 discussed above.

The result is that, if a query is planned in local catalog mode, partition
pruning for "WHERE float_col=0.1" works, because the parser parses "0.1"
directly from string to BigDecimal, then onto decimal.

But, if the same query is planned in traditional model, the extra Thrift
conversion cause the bogus value shown above to be used in comparisons,
resulting in a failed partition match.

The temporary solution is to convert from double in Thrift back to string, and
from String to BigDecimal. This is, obviously, quite silly.

The bigger issue is that there is no good reason for the catalog server to
parse partition keys into literal expressions only to be converted to Thrift.
Better to leave the partition keys as strings and allow the coordinator to do
any required parsing to literal expressions.

Note that, in the current design, with code before a recent revision, the
catalog server must analyze each literal expression, but there is no
coordinator to provide the analyzer, so special code was needed to allow
analysis with a null analyzer, needlessly complicating the logic.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (IMPALA-8145) Partition metadata key muddle

Reply via email to