mapleFU commented on issue #15145:
URL: https://github.com/apache/arrow/issues/15145#issuecomment-1369430274

   And It's weird that, our `DictEncoderImpl` only supports encoding as 
`RLE_DICTIONARY`, and it `WriteIndices` use `rle` to encoding. But if the 
parquet format is v1, the page would be `PLAIN_DICTIONARY` but write 
`RLE_DICTIONARY`.
   
   The related code is:
   
   ```c++
     int WriteIndices(uint8_t* buffer, int buffer_len) override { ... }
   
     inline Encoding::type dictionary_page_encoding() const {
       if (parquet_version_ == ParquetVersion::PARQUET_1_0) {
         return Encoding::PLAIN_DICTIONARY;
       } else {
         return Encoding::PLAIN;
       }
     }
   
     void WriteDictionaryPage() override {
       ...
       DictionaryPage page(buffer, current_dict_encoder_->num_entries(),
                           properties_->dictionary_page_encoding());
       total_bytes_written_ += pager_->WriteDictionaryPage(page);
     }
   ```
   
   If we only support write RLE_DICTIONARY, it's ok that `DictEncoderImpl` uses 
`RLE_DICTIONARY`
   
   @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to