For the types in the ORC footer, we have the following:

 // the maximum length of the type for varchar or char in UTF-8 characters
 optional uint32 maximumLength = 4;
 // the precision and scale for decimal
 optional uint32 precision = 5;
 optional uint32 scale = 6;

If the maximumLength, is set to N, can I be confident that no value for that 
column in the file will contain more than N UTF-8 characters?  Is this still 
true for concatenated ORC files.

I have a similar question about DECIMAL.  Decimal encoding currently uses the 
SECONDARY stream to encode the "scale".  Is this scale guaranteed to be the 
same scale as the type scale in the footer?

Thanks,

-dain


----
Dain Sundstrom
Co-founder @ Presto Software Foundation, Co-creator of Presto 
(https://prestosql.io)

Reply via email to