mapleFU commented on PR #34632:
URL: https://github.com/apache/arrow/pull/34632#issuecomment-1483365469

   Seems Java has a config, but use would not change it:
   
   ```java
   /**
    * Config for delta binary packing
    */
   class DeltaBinaryPackingConfig {
     final int blockSizeInValues;
     final int miniBlockNumInABlock;
     final int miniBlockSizeInValues;
   
     public DeltaBinaryPackingConfig(int blockSizeInValues, int 
miniBlockNumInABlock) {
       this.blockSizeInValues = blockSizeInValues;
       this.miniBlockNumInABlock = miniBlockNumInABlock;
       double miniSize = (double) blockSizeInValues / miniBlockNumInABlock;
       Preconditions.checkArgument(miniSize % 8 == 0, "miniBlockSize must be 
multiple of 8, but it's " + miniSize);
       this.miniBlockSizeInValues = (int) miniSize;
     }
   
     public static DeltaBinaryPackingConfig readConfig(InputStream in) throws 
IOException {
       return new DeltaBinaryPackingConfig(BytesUtils.readUnsignedVarInt(in),
               BytesUtils.readUnsignedVarInt(in));
     }
   
     public BytesInput toBytesInput() {
       return BytesInput.concat(
               BytesInput.fromUnsignedVarInt(blockSizeInValues),
               BytesInput.fromUnsignedVarInt(miniBlockNumInABlock));
     }
   }
   ```
   
   arrow-rs use fixed size writer:
   
   ```rust
   impl<T: DataType> DeltaBitPackEncoder<T> {
       /// Creates new delta bit packed encoder.
       pub fn new() -> Self {
           Self::assert_supported_type();
   
           // Size miniblocks so that they can be efficiently decoded
           let mini_block_size = match T::T::PHYSICAL_TYPE {
               Type::INT32 => 32,
               Type::INT64 => 64,
               _ => unreachable!(),
           };
   
           let num_mini_blocks = DEFAULT_NUM_MINI_BLOCKS;
           let block_size = mini_block_size * num_mini_blocks;
           assert_eq!(block_size % 128, 0);
   
           DeltaBitPackEncoder {
               page_header_writer: BitWriter::new(MAX_PAGE_HEADER_WRITER_SIZE),
               bit_writer: BitWriter::new(MAX_BIT_WRITER_SIZE),
               total_values: 0,
               first_value: 0,
               current_value: 0, // current value to keep adding deltas
               block_size,       // can write fewer values than block size for 
last block
               mini_block_size,
               num_mini_blocks,
               values_in_block: 0, // will be at most block_size
               deltas: vec![0; block_size],
               _phantom: PhantomData,
           }
       }
   ```
   
   I think I can first make it in, and than add an issue to control that 
behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to