[elixir-core:7104] [Proposal] strict binary parts split API

christhekeele Mon, 01 May 2017 21:51:07 -0700

Hey all!

Binary split functions (Regex and String) to return a list from the 
underlying :binary.split/3 (global) strategies, and support a specific parts: 
n option to limit the depth of the split:


https://hexdocs.pm/elixir/Regex.html#split/3
https://hexdocs.pm/elixir/String.html#split/3

Additionally, Regex, String, and Path have default split patterns 
pre-programmed:

https://hexdocs.pm/elixir/Regex.html#split/1
https://hexdocs.pm/elixir/String.html#split/1
https://hexdocs.pm/elixir/Path.html#split/1

All of these return lists instead, even the parts variants. The parts 
variants return lists guaranteed to be at least as long as, but not as long 
as, the part count requested.

What I'm wishing for right now is a more assertive series of binary 
splitting functions, in-between pure strict-length binary pattern matching 
and fast-and-loose list results: I'd like a function that guarantees the 
full amount of parts will be returned or nothing more.

My current use-case is splitting apart a binary format that embeds 
null-byte-delimited headers where a lack of the precise number of headers 
indicates a corrupted data block. Re-asserting the returned list has the 
requested length feels like an unnecessary handshake with the split-parts 
API. I make this proposal with the dim recollection of wanting this several 
times before, though.

If I can garnish some support from core, I'd like to propose a &split!/3 
API for Regex and String.

These functions would have a split!(binary, pattern, parts) signature that 
raises if the requested parts cannot be generated.

To help enforce the parts-length requirement, they could be returned as 
tuples instead, similar to the rest of the two-tuple split functions 
mentioned at the end of this post.

There are two reasons why I can conceive they might merit a place in the 
stdlib:

   1. This is a common-enough desire that people amongst core can empathize 
   with the desire for such functions.
   2. This can be easily optimized beyond a dumb split/3, list length 
   validation, &List.to_tuple/1 implementation.

However I'm not confident in either two points so I thought I'd flight this 
here before investigating a PR.

An additional capability would be to support a split!(binary, parts) 
implementation 
for Regex and String that leverages the underlying &split/1 pattern default 
present in both modules.

Also, if it proves to make sense, Path.split!/{2,3} could be a part of such 
a feature. However, I doubt the underlying :filename.split/1 call would 
respond to the optimizations in point 2 I propose, should they exist, 
although I imagine those variants could be implemented by hand fairly 
efficiently.

(Key-value split functions have two-tuple results, but they don't apply to 
this discussion.)

https://hexdocs.pm/elixir/Dict.html#split/2
https://hexdocs.pm/elixir/HashDict.html#split/2
https://hexdocs.pm/elixir/Keyword.html#split/2
https://hexdocs.pm/elixir/Map.html#split/2

(Enumerable splitting has two-tuple results but with very simpler 
intentions than common binary splitting strategies. However converging 
around a Enum.split(enumerable, item, parts: n) version might find a place 
there, too alongside a Enum.split!/{2,3} implementation. But lacking a 
default split strategy, it would have to omit a default all-parts strategy 
of Enum.split(enumerable, item) to avoid clashing with the existing 
Enum.split(enumerable, 
count).)

https://hexdocs.pm/elixir/Enum.html#split/2

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/dd597fb8-1b22-40ba-a625-5498bf767137%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[elixir-core:7104] [Proposal] strict binary parts split API

Reply via email to