bkietz opened a new pull request, #37526:
URL: https://github.com/apache/arrow/pull/37526

   String view (and equivalent non-utf8 binary view) is an alternative 
representation for
   variable length strings which offers greater efficiency for several common 
operations.
   This representation is in use by UmbraDB, DuckDB, and Velox. Where those 
databases use
   a raw pointer to out-of-line strings this PR uses a pair of 32 bit integers 
as a
   buffer index and offset, which
   
   -   makes explicit the guarantee that lifetime of all character data is equal
       to that of the array which views it, which is critical for confident
       consumption across an interface boundary
   -   makes the arrays meaningfully serializable and
       venue agnostic; directly usable in shared memory without modification
   -   allows easy validation
   
   This PR is extracted from https://github.com/apache/arrow/pull/35628 to 
unblock independent PRs now that the vote has passed, including:
   
   -   New types added to Schema.fbs
   -   Message.fbs amended to support variable buffer counts between string 
view chunks
   -   datagen.py extended to produce integration JSON for string view arrays
   -   Columnar.rst amended with a description of the string view format
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to