mapleFU commented on issue #15145: URL: https://github.com/apache/arrow/issues/15145#issuecomment-1369430274
And It's weird that, our `DictEncoderImpl` only supports encoding as `RLE_DICTIONARY`, and it `WriteIndices` use `rle` to encoding. But if the parquet format is v1, the page would be `PLAIN_DICTIONARY` but write `RLE_DICTIONARY`. The related code is: ```c++ int WriteIndices(uint8_t* buffer, int buffer_len) override { ... } inline Encoding::type dictionary_page_encoding() const { if (parquet_version_ == ParquetVersion::PARQUET_1_0) { return Encoding::PLAIN_DICTIONARY; } else { return Encoding::PLAIN; } } void WriteDictionaryPage() override { ... DictionaryPage page(buffer, current_dict_encoder_->num_entries(), properties_->dictionary_page_encoding()); total_bytes_written_ += pager_->WriteDictionaryPage(page); } ``` If we only support write RLE_DICTIONARY, it's ok that `DictEncoderImpl` uses `RLE_DICTIONARY` @pitrou -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org