Help with object serialization please!

cblake Fri, 09 Jul 2021 07:29:59 -0700

@Scotpip \- while all the @treeform/@planetis advice may be relevant if you 
need to serialize/deserialize to strings, it sounds like you are doing so **_to 
a binary file_**. Given that as a hard constraint, you can save much 
unnecessary work - possibly dozens to hundreds of times more than you need to. 
(Well, formally a different big-O scaling depending upon what you are doing 
since files are random access...).


You can even just **_run right off of the binary file directly_** via memory 
mapped files as supported in Nim's `std/memfiles`. 
[suggest](https://github.com/c-blake/suggest) has an example of how to do this 
in a complex spell check database context (maybe start at `proc open(string): 
Suggestor` \- really this code is easier to "run" than "read"), but your data 
is so much simpler.

You could just A) make your timestamp either an `int64` as you suggest or B) 
more generally an integer offset into a file of fixed length records. Either 
way your object becomes a simple, fixed size thing with very unspecial CPU-like 
field types. So, you can then just **_cast the memory address_** to an `ptr 
UncheckedArray[Bar]`. If you want to grow/write the file "outside the memory 
map" then you can either append records with `writeBuffer` or you could expand 
the map and write via array indexing. Reading is just `myFile[i]`, etc.

Ideas along these lines are developed into a more general purpose thing over at 
[nio](https://github.com/c-blake/nio) (but there is much functionality to add 
there to be fully capable) which may be simpler example code to read 
than`suggest` (or not!). I just wrote that up recently and it hasn't been 
thoroughly tested.

For a case as simple as yours with just one datatype, though you can just do 
your own little `Bar` API on top of `memfiles`. This should be do-able even for 
little brain bear in my estimation. :-) This idea is **_extremely simple, but 
often overlooked_**.

Arguably, **_nothing can be much faster_**. You just use the Nim/C compiler 
"field access" as your "binary data interpreter", just as you would always do 
with an in-memory object. The file just becomes a new "persistent allocation 
region". If you access your records "by record index" you hardly notice any 
difference - except speed from not having to re-do whatever work was required 
to create the data. If you want just the most recent 10000 then you can just 
figure that out from the file size & done. You do not need to even 
**_iterate_** over any earlier data, page it off of persistent devices or 
anything.

If as @planetis mentions you care about **_memory safety_** which can indeed be 
critical then you probably want to implement a bounds checked dynamic array on 
top of the `UncheckedArray` with your own `[]` and `[]=` operators in your 
little `Bar` APIs. While there is a `std/rtarray`, it was always very 
preliminary and almost no one uses it.

The only catch is **_binary format portability_**. Using the `{.packed.}` 
pragma on your `object` can usually limit that to CPUs of actually differing 
byte order. This is often not a problem because little endian has become so 
dominant with Intel/AMD/ARM. It would certainly be easy to write a little byte 
swapping converter program for occasional translation needs, but I've yet to 
ever come across that need personally in over 20 years of this kind of binary 
data management.

Help with object serialization please!

Reply via email to