For the types in the ORC footer, we have the following: // the maximum length of the type for varchar or char in UTF-8 characters optional uint32 maximumLength = 4; // the precision and scale for decimal optional uint32 precision = 5; optional uint32 scale = 6;
If the maximumLength, is set to N, can I be confident that no value for that column in the file will contain more than N UTF-8 characters? Is this still true for concatenated ORC files. I have a similar question about DECIMAL. Decimal encoding currently uses the SECONDARY stream to encode the "scale". Is this scale guaranteed to be the same scale as the type scale in the footer? Thanks, -dain ---- Dain Sundstrom Co-founder @ Presto Software Foundation, Co-creator of Presto (https://prestosql.io)
