I saw some activity on my ancient sharding merge request. I've been dusting that off and am actually trying to tackle the problem I've been stalled on all this time, which I'll describe here.
The naive sharding approach in the current MR assumes the same shards will always exist. This sounds too naive to be practical. When you get that brand new hard drive, possibly because another one bit the dust, you want a permanent data repository to be able to handle replicating and migrating data. So at the very least, I'm working on a generational configuration that allows a single blob to map onto multiple shards, and to allow the shards available to change over time. The default selector will still map blobs to shards deterministically based on their hash and size. That said, my true goal is to sort-of automatically manage these storage generations. Basically, I want to be able to throw disks into the configuration and have data automatically shuffled to take advantage of the new disk. I also want to configure a redundancy threshold and be alerted if my data grows too large or if usable disk space grows too small to provide the level of redundancy requested. Additional gravy: to monitor the performance and S.M.A.R.T. data from the drives to make inferences about disk health and warn ahead of time when redundancy may fall. Even further gravy: use that performance data to specifically shuffle data onto disks that are fast at reading and have open space on disks that are fast at writing. Anyway, I'm not sure what of this (if any) I'll actually finish by the next release cycle, but I realize I'm quiet here and wanted to chime in with what I've been tinkering with. Thanks for the thread! Good idea! On Friday, January 5, 2018 at 8:25:30 PM UTC-5, Eric Drechsel wrote: > > Many open source projects have a tradition to kick off a new dev cycle > with a thread where contributors list tasks/issues they are planning to > work on in the coming months. > > It seems like PerKeep has been building some momentum over the Winter, and > I'm really curious to hear what people have on their personal roadmaps, > both core commiters and new/sometime contribs. > > I'll kick it off :D While I haven't been very involved in the past year, I > am interested to make more contributions. Here are the things I'm keen to > work on: > > - Update this old change list exposing permanode IDs as a FUSE xattr > [1]. > - Have build infrastructure produce Synology packages, or figure out > best practices for deploying to Synology using Docker etc [2]. > - Write a client side example app/guide using the GopherJS data access > layer but without using GopherJS for the app code. > - If this isn't currently possible I would like to work on making > sure the GopherJS blob exports a nice data access API to JS. > - Write additional content importers: > - Git repository fetcher > - Website crawler > - Work on maintainence/new features in the Web UI > > [1] https://camlistore-review.googlesource.com/c/camlistore/+/2869 > [2] https://github.com/camlistore/camlistore/issues/986 > > Of course I'd be happy if anyone else wants to work on any of these > features too :D > > Also, are there any PerKeep meetups/hack sessions planned? LinuxFestNW CFP > is open till the end of the month... :) > > -- > best, Eric > eric.pdxhub.org <http://pdxhub.org/people/eric> > > -- You received this message because you are subscribed to the Google Groups "Camlistore" group. To unsubscribe from this group and stop receiving emails from it, send an email to camlistore+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.