Re: Per-contributor focuses for next major release, hack sessions

Stephen Searles Sat, 13 Jan 2018 13:18:35 -0800

I saw some activity on my ancient sharding merge request. I've been dusting 
that off and am actually trying to tackle the problem I've been stalled on 
all this time, which I'll describe here.

The naive sharding approach in the current MR assumes the same shards will 
always exist. This sounds too naive to be practical. When you get that 
brand new hard drive, possibly because another one bit the dust, you want a 
permanent data repository to be able to handle replicating and migrating 
data. So at the very least, I'm working on a generational configuration 
that allows a single blob to map onto multiple shards, and to allow the 
shards available to change over time. The default selector will still 
map blobs to shards deterministically based on their hash and size.

That said, my true goal is to sort-of automatically manage these storage 
generations. Basically, I want to be able to throw disks into the 
configuration and have data automatically shuffled to take advantage of the 
new disk. I also want to configure a redundancy threshold and be alerted if 
my data grows too large or if usable disk space grows too small to provide 
the level of redundancy requested. Additional gravy: to monitor the 
performance and S.M.A.R.T. data from the drives to make inferences about 
disk health and warn ahead of time when redundancy may fall. Even further 
gravy: use that performance data to specifically shuffle data onto disks 
that are fast at reading and have open space on disks that are fast at 
writing.

Anyway, I'm not sure what of this (if any) I'll actually finish by the next 
release cycle, but I realize I'm quiet here and wanted to chime in with 
what I've been tinkering with.

Thanks for the thread! Good idea!

On Friday, January 5, 2018 at 8:25:30 PM UTC-5, Eric Drechsel wrote:
>
> Many open source projects have a tradition to kick off a new dev cycle 
> with a thread where contributors list tasks/issues they are planning to 
> work on in the coming months. 
>
> It seems like PerKeep has been building some momentum over the Winter, and 
> I'm really curious to hear what people have on their personal roadmaps, 
> both core commiters and new/sometime contribs.
>
> I'll kick it off :D While I haven't been very involved in the past year, I 
> am interested to make more contributions. Here are the things I'm keen to 
> work on:
>
>    - Update this old change list exposing permanode IDs as a FUSE xattr 
>    [1].
>    - Have build infrastructure produce Synology packages, or figure out 
>    best practices for deploying to Synology using Docker etc [2].
>    - Write a client side example app/guide using the GopherJS data access 
>    layer but without using GopherJS for the app code.
>       - If this isn't currently possible I would like to work on making 
>       sure the GopherJS blob exports a nice data access API to JS.
>    - Write additional content importers:
>       - Git repository fetcher
>       - Website crawler
>    - Work on maintainence/new features in the Web UI
>
>  [1] https://camlistore-review.googlesource.com/c/camlistore/+/2869
>  [2] https://github.com/camlistore/camlistore/issues/986
>
> Of course I'd be happy if anyone else wants to work on any of these 
> features too :D
>
> Also, are there any PerKeep meetups/hack sessions planned? LinuxFestNW CFP 
> is open till the end of the month... :)
>
> -- 
> best, Eric
> eric.pdxhub.org <http://pdxhub.org/people/eric>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Camlistore" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Per-contributor focuses for next major release, hack sessions

Reply via email to