Hi,

This is a D port of a Go package implementing Content-Defined Chunking:

https://github.com/CyberShadow/chunker

The package contains the following modules:

- chunker.polynomials - implements Pol, a type which represents a polynomial from F_2[X]. I'm not quite sure what that is, but they seem to be very useful.

- chunker.rabin - implements RabinHash, which calculates a rolling Rabin Fingerprint.

- chunker - implements Chunker, an adapter range which accepts chunks of bytes (such as from File.byChunk) and emits variable-size content-defined chunks, which are split when the local Rabin Fingerprint reaches a certain value.

Links
-----

- Wikipedia: https://en.wikipedia.org/wiki/Rolling_hash#Rabin_fingerprint

- Original Go version: https://github.com/restic/chunker

- Dub package: https://code.dlang.org/packages/chunker

- Documentation: https://chunker.dpldocs.info/chunker.html (courtesy of Adam Ruppe's dpldocs service)

- Example: https://github.com/cybershadow/chunker/blob/master/src/chunker/example.d

Differences from the Go version
-------------------------------

- Chunker was adapted to be a D range and accept D ranges as input.

- The Rabin Fingerprint implementation was extracted out of Chunker and into its own module. It is usable stand-alone.

- Significant refactorings and simplifications of the implementation. The original code made some sacrifices in code readability to work around limitations of the language and compiler optimization to achieve reasonable performance.

- 20% faster than the Go version (LDC release build).

- Improved test coverage and symbol documentation.

The original package was written by Alexander Neumann and is used in the restic backup program.

  • Chunker - Content-Defined Ch... Vladimir Panteleev via Digitalmars-d-announce

Reply via email to