chaokunyang opened a new issue, #1936:
URL: https://github.com/apache/fury/issues/1936

   ### Feature Request
   
   Implement chunk based map predictive protocol, which is 2x faster than 
current one.
   
   > All Map serializers must extend `AbstractMapSerializer`.
   
   Format:
   
   ```
   | length(unsigned varint) | key value chunk data | ... | key value chunk 
data |
   ```
   
   #### map key-value chunk data
   
   Map iteration is too expensive, Fury won't compute the header like for list 
since it introduce
   [considerable overhead](https://github.com/apache/fury/issues/925).
   Users can use `MapFieldInfo` annotation to provide the header in advance. 
Otherwise Fury will use first key-value pair
   to predict header optimistically, and update the chunk header if the 
prediction failed at some pair.
   
   Fury will serialize the map chunk by chunk, every chunk has 255 pairs at 
most.
   
   ```
   |    1 byte      |     1 byte     | variable bytes  |
   +----------------+----------------+-----------------+
   | chunk size: N  |    KV header   |   N*2 objects   |
   ```
   
   KV header:
   
   - If track key ref, use the first bit `0b1` of the header to flag it.
   - If the key has null, use the second bit `0b10` of the header to flag it. 
If ref tracking is enabled for this
     key type, this flag is invalid.
   - If the key types of map are different, use the 3rd bit `0b100` of the 
header to flag it.
   - If the actual key type of the map is not the declared key type, use the 
4rd bit `0b1000` of the header to flag it.
   - If track value ref, use the 5th bit `0b10000` of the header to flag it.
   - If the value has null, use the 6th bit `0b100000` of the header to flag 
it. If ref tracking is enabled for this
     value type, this flag is invalid.
   - If the value types of the map are different, use the 7rd bit `0b1000000` 
header to flag it.
   - If the value type of map is not the declared value type, use the 8rd bit 
`0b10000000` of the header to flag it.
   
   If streaming write is enabled, which means Fury can't update written `chunk 
size`. In such cases, map key-value data
   format will be:
   
   ```
   |    1 byte      | variable bytes  |
   +----------------+-----------------+
   |    KV header   |   N*2 objects   |
   ```
   
   `KV header` will be a header marked by `MapFieldInfo` in java. For languages 
such as golang, this can be computed in
   advance for non-interface types most times. The implementation can generate 
different deserialization code based read
   header, and look up the generated code from a linear map/list.
   
   #### Why serialize chunk by chunk?
   
   When fury will use first key-value pair to predict header optimistically, it 
can't know how many pairs have same
   meta(tracking kef ref, key has null and so on). If we don't write chunk by 
chunk with max chunk size, we must write at
   least `X` bytes to take up a place for later to update the number which has 
same elements, `X` is the num_bytes for
   encoding varint encoding of map size.
   
   And most map size are smaller than 255, if all pairs have same data, the 
chunk will be 1. This is common in golang/rust,
   which object are not reference by default.
   
   Also, if only one or two keys have different meta, we can make it into a 
different chunk, so that most pairs can share
   meta.
   
   The implementation can accumulate read count with map size to decide whether 
to read more chunks.
   
   ### Is your feature request related to a problem? Please describe
   
   _No response_
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   #925


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to