[Discuss] IPFS plug (was: Version Control/Workflow Management for Data)

W. Trevor King Thu, 03 Mar 2016 13:02:02 -0800

On Thu, Mar 03, 2016 at 01:38:43PM -0700, Davide Del Vento wrote:
> I know this is suboptimal, but I think that's the best you can do at
> the moment (and that assumes that at least one dataset would fit in
> your disk, which for climate datasets could be a generous
> assumption).


Depending on how you organize/access your data, IPFS [1] might be a
good solution for distributing your data over multiple machines while
still being able to easily access the subset you need from a single
host.  For examlpe, if your huge data is setup like

  .
  |-- 2014
  |   `-- …
  |-- 2015
  |   `-- …
  `-- 2016
      `-- …

IPFS would be good if you only needed one year at a time on the local
disk.  It wouldn't be good if you needed January data across a range
of years, unless someone had also setup an index by month:

  .
  |-- 01
  |   `-- …
  |-- 02
  |   `-- …
  |-- 03
  …   `-- …

The data is content-addressable, so 2014/01/some-data (via the first
indexing scheme) and 01/2014/some-data (via the second indexing
scheme) would both use the same local object for the ‘some-data’ leaf.

And while there are plans to build Git-like version control onto IPFS,
I don't think anyone has gotten around to that yet.  With the current
version, you get immutable Merkle hashes that uniquely identify your
data [2], but you don't have commit objects linking those snapshots
together.

Anyhow, IPFS is still pretty new and fluxy, so I wouldn't trust it as
the sole location of important data, but folks who are bumping up
against data management issues might want to give it a spin.

Cheers,
Trevor

[1]: https://ipfs.io/
[2]: https://en.wikipedia.org/wiki/Merkle_tree

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

[Discuss] IPFS plug (was: Version Control/Workflow Management for Data)

Reply via email to