asfimport opened a new issue, #404: URL: https://github.com/apache/parquet-format/issues/404
The spec for DICTIONARY_ENCODING states that: > If the dictionary grows too big, whether in size or number of distinct values, the encoding will fall back to the plain encoding. https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8 However, the parquet-mr implementation was deliberately changed to a different fallback mechanism in https://issues.apache.org/jira/browse/PARQUET-52 I'm assuming the parquet-mr implementation is authoritative here. But then the spec is incorrect and should be fixed to reflect expected behavior. **Reporter**: [Antoine Pitrou](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=apitrou) / @pitrou <sub>**Note**: *This issue was originally created as [PARQUET-2221](https://issues.apache.org/jira/browse/PARQUET-2221). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*</sub> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
