Re: what is your process for getting files into Camlistore?

Michael Farr Mon, 16 Oct 2017 06:02:42 -0700

I think you are asking for examples of my existing archive processes.  I'll 
try a few of those, they are all variants on the same theme to me, but you 
might see them differently somehow.  In some parts I have outlined parts of 
the scenario that sit outside what I expect Camlistore or any archiving 
system to be involved in - however, I figure it is better to give too much 
information than not enough.


1. Consultant with long-term system support obligations to clientX on 
behalf of OrgX

One organisation (OrgX), for better or worse, structures their files in a 
hierarchy based on client and then project. Each project has at least 3 
subfolders (requirements, design and technical) and each of those contains 
their own hierarchy of files and folders.  The design folder has a very 
high turnover rate of data, if you were to snapshot the design folder every 
second you would easily generate 20GB of data in a single day.  Maybe in 
the whole project, there will be 60-400MB of archive quality stuff.

Some designers create a number of manual snapshots (files with different 
names but only slightly different content) throughout the day.  The vast 
majority of those files should not be archived and can be discarded within 
a few days.  The rule of thumb is that if the last file that the client 
approves is archived.  The files with rejected "design directions" should 
also be kept.  Files that contain spelling mistakes and other sillyness 
should be removed.

The technical folder contains 5-15 documents and some of those will contain 
detailed sensitive information.   All the content needs to be archived, but 
the documents with sensitive information must be encryption before they go 
into the archive.  I get a lot of value out of referring to good technical 
solutions in other projects, so it is really useful to have good quality 
metadata for later search.

The requirements folder is synced with a large number of people and it 
tends to get filled with extraneous information.  Sometimes it will have 
information from other clients or projects and a lot of nonsense discussion 
files.  In reality, there are only a few true requirements documents per 
project, but I'll often archive most of the information in here anyway.  
Some documents, however, must not be archived in the context of that 
project.

Summary:
/home/mike/OrgX/clientX/projectX:
requirements
extraneous information
design 
lots of files that shouldn't be archived.
technical
some sensitive information


2. Canon Raw Images
I store a fair amount of image from my Canon DSLR in raw format.  Raw image 
files are a container of metadata, thumbnail image, raw image data and a 
"recipe".  You can edit the file with a bunch of different programs, but f 
you use Canon's recommended application, you will only edit the recipe in 
the file but the actual raw data is not touched. 

Typically I am only interested in storing the archiving the original raw 
file, the recipe changes are not valuable from an archiving point of view.  
But sometimes I invest a few minutes of time to add touchups to the image, 
they are valuable and I want to capture those and generate a new raw file 
(removing the old content).  Each file is about 20-40MB, but I might save 
each file 20 different times putting different effects and crops in before 
I am finished.  I have many thousands of images, so it just isn't sensible 
to keep each and every variant.  In reality I only spent time tweaking 1 in 
10 of the images.

note: I case you are wondering, I do not expect Camistore to know about how 
Canon raw files work or anything about any other specific filetype - beyond 
extracting the EXIF metadata.  Right now, I am just interested in how/where 
Camlistore can help with the situation.

3. Movie Production
Each movie is created using a bunch of different source videos, images and 
audio files.  The output video might be cut up in a few different ways for 
different purposes.  So there might be one shortish cut for youtube that 
acts as a reference for people to quickly scan over the content.  Another 
cut might be the executive summary that just shows the main outcome (and 
none of the technology or educational facts) and will be only shared for a 
short time after the work is complete.

If the video is being hosted in another system (such as youtube) and I have 
the uncut movie, I only want to store the reference to the youtube - I 
don't want to archive the cut data.  I keep a lot of cuts for a period 
because they are a pain to reproduce for each person that missed the 
orginal (even if I keep the project file).  However, after some time has 
passed, and I need to recoup space - those cuts would be the first to go.  
It would be nice if there was a way to keep the associations between the 
source videos, cuts, full videos and the project files.

4. Ideas for other examples if you want more:
 - software projects
 - the "secure connect" folder (groan)

Mike

On Monday, October 16, 2017 at 3:20:13 AM UTC+10, mpl wrote:
>
> On 15 October 2017 at 01:05, Michael Farr <[email protected] 
> <javascript:>> wrote: 
> > Thanks for answers so far :-) 
> > 
> > In its shortest form, my end goal is best described using the 
> introduction 
> > from https://camlistore.org/: to store, model, search and synchronize 
> data 
> > in the post-PC era.  Here are some more criteria: 
> > 
> > The main goal is to archive all my files to a futureproofed file storage 
> > solution.  Easily transferable between any particular device, storage 
> format 
> > or cloud service technology.  Camlistore itself does introduce its own 
> > storage format, but the blobs are relatively easy to understand and 
> re-index 
> > if I ever needed to. 
> > Files that are "important" should be stored redundantly, and I want some 
> > level of control over that redundancy.  Currently, I can only really 
> control 
> > redundancy in Camlistore by changing the configuration at the server 
> level - 
> > I am OK with that, but I suspect that there are better solutions. 
> > I want some easy/automated way to apply the redundancy - generally 
> achieved 
> > in Camlistore by syncing files to multiple blob stores.  I should be 
> able to 
> > sync to any type of attached storage or cloud storage location, I should 
> > also be able to switch cloud blob storage locations easily. 
> > I want to be able to tag my files with things like location, project, 
> > client, subject and general quality of the file - so I can find them 
> more 
> > easily later 
> > I would also like to be able to apply other policies like encryption.  I 
> > know that Camlistore itself doesnt come with built-in encryption 
> features 
> > right now (thanks for your clarification), but it seems like something 
> it 
> > could have in the future. 
> > 
> > Hopefully, the above description is a reasonable application of 
> Camlistore's 
> > technologies - even though some of it is aspirational. 
>
> ok, makes sense. 
>
> > With respect to "In particular, if your Camlistore is not on a server 
> > accessible from anywhere anytime, why do you want to store things in 
> > Camlistore at all" part of the question. Camlistore is one of only a few 
> > products that use metadata to instead of relying solidly on some folder 
> > hierarchy or another (or requiring me to maintain a single hierarchy). 
>  I am 
> > slightly troubled by the fact that Camlistore doesn't have a serverless 
> mode 
> > - but I can start a cloud hosted server if I really need immediate 
> access 
> > across multiple devices. 
>
> true, we rely on index+search instead of "organizing" things, and that 
> can be an advantage in itself, even if only available locally. fair 
> enough. 
>
> > I didn't think that the immediacy thing was something inherent or 
> necessary 
> > to Camlistore; that part of your question makes me worry I have 
> > misunderstood the capabilities of Camlistore somehow. Shouldn't I just 
> be 
> > able to turn server nodes off and on as I need them?  My current 
> interaction 
> > with Camlistore is that I am away from my home and any sort of internet 
> > connectivity and collect files etc then.  Then when I get home (and have 
> > low-cost internet access) I just want those files to be safely archived 
> for 
> > future use.  I am looking forward to using the Camlistore WebUI for some 
> > practical purpose, but I just haven't needed it yet (it is cool to play 
> > with) 
>
> Sure, you should be able to shut down your Camlistore instance 
> whenever you want, restart it, and continue doing whatever you were 
> doing with it. Reindexing shouldn't even be needed most of the time. 
> In a case similar to your scenario, what I sometimes do is store new 
> files on an ephemeral (devcam server) instance on my laptop, and when 
> I get home I sync the blobs from the laptop instance to the instance 
> on my server. 
>
> > I hope that I have answered your question. 
>
> Yes. Now I can try to answer better the problems you have exactly, if 
> you give me concrete examples. 
>
> > Mike 
> > 
> > 
> > On Sunday, October 15, 2017 at 3:05:47 AM UTC+10, mpl wrote: 
> >> 
> >> Hi. 
> >> 
> >> short answers/remarks first, until I understand better what you're 
> >> talking about. 
> >> 
> >> On 14 October 2017 at 12:32, Michael Farr <[email protected]> wrote: 
> >> > 
> >> > I am having some difficulty with getting all my files into Camlistore 
> - 
> >> > mainly because I don't have a reliable way to sorting through them. 
>  My 
> >> > files are currently in few different folder hierarchies that are all 
> >> > undergoing regular change.  I have a few reasons for wanting to 
> pre-sort 
> >> > through the files before going into Camlistore: 
> >> > 
> >> > Some of the files are large and only relevant for a short period of 
> >> > time. 
> >> > These files will are associated with and sit near other files in the 
> >> > hierarchy do need to be synced with Camlistore.  So I manually camput 
> >> > files 
> >> > in these folders to save space and bandwidth*^ 
> >> > Some files need to be encrypted with an external tool before being 
> >> > synced 
> >> > *~.  Given the current lack of delete** I am keen to avoid creating 
> an 
> >> > unencrypted blob that I can’t easily remove later. 
> >> > Some files are being changed frequently (many times per minute) and I 
> do 
> >> > not 
> >> > want too many copies of them. (large design files, media renderings, 
> >> > software project binaries etc) 
> >> > Some files are already encrypted/zipped or otherwise packaged in a 
> >> > format 
> >> > that Camlistore will not be able to extract meaningful metadata from. 
> >> > The 
> >> > metadata for that file exists within another file.  If the sync is 
> >> > completely automatic, it will be hard to detect when this situation 
> >> > arises 
> >> > to know when to attach metadata to files like this. 
> >> > 
> >> > My current pre-camput process is quite cumbersome and I am not 
> >> > particularly 
> >> > reliable at it, but I have only made some very basic initial steps 
> with 
> >> > Camlistore as a whole at using it so there is a lot I can learn.  Are 
> >> > there 
> >> > any steps I can take to avoid the aforementioned pre-processing 
> problems 
> >> > neatly within Camlistore?  I only have one internet connected laptop 
> and 
> >> > my 
> >> > local USB drives are almost at capacity.  I would prefer to use B2 or 
> >> > Wasabi 
> >> > for future my blob storage needs. 
> >> 
> >> I'm not sure I understand exactly what you want to achieve in the end. 
> >> In particular, if your Camlistore is not on a server accessible from 
> >> anywhere anytime, why do you want to store things in Camlistore at 
> >> all? 
> >> What is your end goal? 
> >> 
> >> > These are some ideas that might help with the problem, though I am 
> not 
> >> > sure 
> >> > if any are sensible: 
> >> > 
> >> > Buy another USB disk to act as the local Camlistore blob store. 
>  Perform 
> >> > all 
> >> > my preparation there, and then selectively sync that blob store to 
> the 
> >> > cloud 
> >> > blob store with "cond".  I do not like the sound of this solution for 
> >> > lots 
> >> > of reasons, but maybe it is the most straightforward way. 
> >> > Use a separate tool that lets me apply metadata to the files before 
> they 
> >> > go 
> >> > into Camlistore (tmsu).  I can then build scripts that sweep the file 
> >> > system 
> >> > looking for files that are ready to be "camput"ted. (this idea 
> actually 
> >> > sounds quite terrible, but I thought i'd mention it anyway). 
> >> > Import the file metadata into Camlistore without actually importing 
> the 
> >> > file 
> >> > data itself (I’m not sure if this is either possible or sensible). 
>  In 
> >> > theory, I could then manipulate metadata directly within Camlistore 
> and 
> >> > synchronize only blobs that meet certain criteria. 
> >> > Wait for gc.Collect to be fully implemented. 
> >> > 
> >> > 
> >> > 
> >> > If you also have this kind of issue, how do you work with Camlistore 
> >> > now? 
> >> > Maybe one or more of the following: 
> >> > 
> >> > Import everything as soon as it is found and just suck up the extra 
> data 
> >> > costs. 
> >> > 
> >> > Manually pre-process and upload files/folders on their individual 
> merit 
> >> > (ie: 
> >> > as I do) 
> >> 
> >> Without knowing exactly what you want to do, I can't say that I have 
> >> the same issue you do, but yes, when it comes to e.g. my pictures I 
> >> first sort through them before uploading them with camput. 
> >> 
> >> > Custom scripts for running imports on various schedules. 
> >> > Multiple Camlistore servers catering to different storage problems 
> >> > Manually deleting blobs from the cloud storage systems. 
> >> > Use other non-Camlistore systems for tricky situations (google drive 
> >> > etc) 
> >> > All of the above 
> >> > 
> >> > Notes: 
> >> > 
> >> > 1. ** My reading of this issue is that gc.Collect doesn't currently 
> do 
> >> > anything: https://github.com/camlistore/camlistore/issues/792. 
> >> 
> >> yes, the garbage collector is not fully implemented and therefore not 
> >> usable atm. 
> >> 
> >> > 2. *^ right now I have a very small local blob store for evaluation, 
> but 
> >> > I 
> >> > am planning on switching to b2/wasabi for blob store once I get 
> >> > everything 
> >> > working. 
> >> > 
> >> > 3. *~ Possibly I do not understand Camlistore’s approach to 
> encryption 
> >> > key 
> >> > management, so I dont know enough to use it properly.  My solution is 
> to 
> >> > encrypt the most sensitive files before they go into Camlistore even 
> >> > though 
> >> > they are all encrypted again in the blobstore. 
> >> 
> >> Fyi, by default, nothing is encrypted. you'd have to specifically use 
> >> the encrypt blobserver (which is still a "NO GUARANTEES" in terms of 
> >> crypto afair). 
> >> 
> >> > My local systems are not 
> >> > physically secure, so I need to encrypt data with a key hosted on a 
> >> > remote 
> >> > system or one that is derived from a password that I remember.  The 
> >> > local 
> >> > key will protect all my files in the cloud store.  The other keys 
> >> > protect my 
> >> > sensitive files from both internal and external attacks. 
> >> > 
> >> > 4.  I do not have a local permanent server with internet access.  I 
> can 
> >> > only 
> >> > run Camlistore on my laptop. 
> >> > 
> >> > -- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "Camlistore" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> > an 
> >> > email to [email protected]. 
> >> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Camlistore" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Camlistore" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: what is your process for getting files into Camlistore?

Reply via email to