chenbaggio commented on issue #13701:
URL: https://github.com/apache/arrow/issues/13701#issuecomment-1196422212
Dears:
here, I can give one example to descirbe why need a function
to extract binary in byte unit
In distribute database, data has distribute policy and
relatived hash algorithm for different data type,
here we just discuss string-like and binary type, the hash
algorithm need detach string-like or binary
in bytes to calculating, for example , take 1-4 byte cast
to integer and shift-left 16 bits, then take 5-6byte cast to
integer and the result from last step, and so on, the
'utf8_slice_codeunits' function can partly meet the require if all
are ascii, but if the string-like contain chinese, one
chinese may occupied three bytes, start 1 to end 3, three utf8 character
may take nine bytes, but it not meet the hash algorithm,
it only need 3 bytes, so if provide a function but not cast, the same
function arguments like 'utf8_slice_codeunits', it may
called 'binary_slice_byteunit'
At 2022-07-27 11:23:12, "Eduardo Ponce Mojica" ***@***.***> wrote:
Hi @chenbaggio, could you expand on what you refer to as a "byte unit". If
you refer to char (signed integral), you should be able to use it (via casting)
with the current int64_t type for start/end/step. Probably, I am
misunderstanding your request, so could you give an example.
Also, are there other language implementations that have a similar operation?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]