Baunsgaard commented on PR #2398:
URL: https://github.com/apache/systemds/pull/2398#issuecomment-3823936897

   Okay, cool progress on the results!
   
   However, I'm a bit skeptical about your byte estimates for the sizes. Do you 
do extra packing based on the number of bits in your implementation?
   
   The ideal values for the current DDC implementation are 2, 256, and 65,536 
unique values to avoid bit manipulations on lookup (see `AMapToData` 
specializations). Please explicitly compare against these cases and 
double-check your memory calculations.
   
   I'd love to see some results with your idealized input to get a range of 
what to expect vs. what you get.
   
   A recipe for X unique values at length L could be:
   
   1. Use all X unique values once in sequence  
      (e.g., for X=4: `1,2,3,4`)
   
   2. Double repeatedly until you reach length L
      - Round 1: `1,2,3,4` → `1,2,3,4,1,2,3,4` (length 8)
      - Round 2: → `1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4` (length 16)
      - Round 3: → length 32
      - ...and so on
   
   I don't know if it's exactly optimal, but it should be pretty good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to