bkietz opened a new pull request, #35628:
URL: https://github.com/apache/arrow/pull/35628

   String view (and equivalent non-utf8 binary view) is an alternative 
representation for
   variable length strings which offers greater efficiency for several common 
operations.
   This representation is in use by UmbraDB, DuckDB, and Velox. Where those 
databases use
   a raw pointer to out-of-line strings this PR uses a pair of 32 bit integers 
as a
   buffer index and offset, which
   - makes explicit the guarantee that lifetime of all character data is equal
     to that of the array which views it, which is critical for confident
     consumption across an interface boundary
   - makes the arrays meaningfully serializable and
     venue agnostic; directly usable in shared memory without modification
   - allows easy validation
   
   Changes outside the C++ implementation:
   - New types added to `Schema.fbs`
   - `Message.fbs` amended to support variable buffer counts between string 
view chunks
   - `datagen.py` extended to produce integration JSON for string view arrays
   - `Columnar.rst` amended with a description of the string view format
   
   Changes to the C++ implementation:
   - The new types are available with new subclasses of DataType, Array, 
ArrayBuilder, ...
   - The values of string view arrays can be visited as `std::string_view` as 
with StringArray
   - String view arrays can be round tripped through IPC, parquet, and 
integration JSON
   - A variant of the string view type `utf8_view(/*has_raw_pointers=*/true)` 
is supported
     which uses raw pointer views. This enables zero copy interop with code 
which uses
     raw pointer views.
   - Conversions are provided between index/offset view arrays, raw pointer 
view arrays, and 
     regular string arrays.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to