Light-City opened a new issue, #36910:
URL: https://github.com/apache/arrow/issues/36910

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ## Describe the enhancement requested
   
   key_map.cc search_block function.
   
   document or code may be wrong?
   
   If I follow the parameters in the documentation, I get inconsistent results.
   
   The reference is as follows:
   ```
   Example will use input stamp 0x5E and a 64-bit status bytes word with one 
empty slot:
   0x 4B17 5E3A 5E2B 1180.
   
   [1 instruction] Replicate stamp to all bytes by multiplying it by 0x 0101 
0101 0101 0101.
   
   We obtain: 0x 5E5E 5E5E 5E5E 5E5E.
   
   [1 instruction] XOR replicated stamp with status bytes word. Bytes 
corresponding to a matching stamp will be 0, bytes corresponding to empty slots 
will have a value between 128 and 255, bytes corresponding to non-matching 
non-empty slots will have a value between 1 and 127.
   
   We obtain: 0x 1549 0064 0075 4FDE.
   
   [2 instructions] In the next step we want to have information about a match 
in the highest bit of each byte. We can ignore here empty slot bytes, because 
they will be taken care of at a later step. Set the highest bit in each byte 
(OR with 0x 8080 8080 8080 8080) and then subtract 1 from each byte (subtract 
0x 0101 0101 0101 0101 from 64-bit word). Now if a byte corresponds to a 
non-empty slot then the highest bit 0 indicates a match and 1 indicates a miss.
   
   We obtain: 0x 95C9 80E4 80F5 CFDE, then 0x 94C8 7FE3 7FF4 CEDD.
   
   [3 instructions] In the next step we want to obtain in each byte one of two 
values: 0x80 if it is either an empty slot or a match, 0x00 otherwise. We do it 
in three steps: NOT the result of the previous step to change the meaning of 
the highest bit; OR with the original status word to set highest bit in a byte 
to 1 for empty slots; mask out everything other than the highest bits in all 
bytes (AND with 0x 8080 8080 8080 8080).
   
   We obtain: 6B37 801C 800B 3122,
   then 6B37 DE3E DE2B 31A2,
   finally 0x0000 8000 8000 0080.
   
   [2 instructions] Finally, use leading zero bits count and divide it by 8 to 
find an index of the last byte that corresponds either to a match or an empty 
slot. If the leading zero count intrinsic returns 64 for a 64-bit input zero, 
then after dividing by 8 we will also get the desired answer in case of a full 
block without any matches.
   
   We obtain: 16,
   then 2 (index of the first slot within the block that matches the stamp).
   ```
   
   input stamp 0x5E
   block is 0x 4B17 5E3A 5E2B 1180
   but i got some diffenet result.
   
   step 1:
   ```
   // document result
   0x 5E5E 5E5E 5E5E 5E5E
   
   // acutal result
   0x5e5e5e5e5e5e5e00
   ```
   step 2:
   ```
   // document result
   0x 1549 0064 0075 4FDE
   
   // acutal result
   0x1549006400754f80
   ```
   
   The results of the following steps are not the same, so I won’t go into 
details here.
   
   Because, I guess there is a problem with the implementation of 
search_block?Especially step1 calculation.
   
   ```
   // may be wrong???
   uint64_t stamp_pattern = stamp * ((block_high_bits ^ kHighBitOfEachByte) >> 
7);
   ```
   
   ## Component(s)
   C++
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to