Re: [PR] [SYSTEMDS-3779] Add ColGroupDDCLZW with LZW-compressed MapToData [systemds]

via GitHub Fri, 30 Jan 2026 06:20:27 -0800


florian-jobs commented on PR #2398:
URL: https://github.com/apache/systemds/pull/2398#issuecomment-3824021872


   > Okay, cool progress on the results!
   > 
   > However, I'm a bit skeptical about your byte estimates for the sizes. Do 
you do extra packing based on the number of bits in your implementation?
   > 
   > The ideal values for the current DDC implementation are 2, 256, and 65,536 
unique values to avoid bit manipulations on lookup (see `AMapToData` 
specializations). Please explicitly compare against these cases and 
double-check your memory calculations.
   > 
   > I'd love to see some results with your idealized input to get a range of 
what to expect vs. what you get.
   > 
   > A recipe for X unique values at length L could be:
   > 
   > 1. Use all X unique values once in sequence
   >    (e.g., for X=4: `1,2,3,4`)
   > 2. Double repeatedly until you reach length L
   >    
   >    * Round 1: `1,2,3,4` → `1,2,3,4,1,2,3,4` (length 8)
   >    * Round 2: → `1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4` (length 16)
   >    * Round 3: → length 32
   >    * ...and so on
   > 
   > I don't know if it's exactly optimal, but it should be pretty good.
   
   Good question! At the moment the codes are still stored as int values by the 
LZW logic, but I’m in the process of changing the storage representation.
   
   Instead of storing one code per array element, I’m implementing a bit-packed 
long wordstream, where codes are packed based on a fixed bit width (derived 
from the maximum emitted code), with the option to extend this to a growing 
bit-width policy later if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [SYSTEMDS-3779] Add ColGroupDDCLZW with LZW-compressed MapToData [systemds]

Reply via email to