On Thu, Jan 14, 2016 at 2:13 PM, Stephan Hoyer <sho...@gmail.com> wrote: > On Thu, Jan 14, 2016 at 8:26 AM, Travis Oliphant <tra...@continuum.io> > wrote: >> >> I don't know enough about xray to know whether it supports this kind of >> general labeling to be able to build your entire data-structure as an x-ray >> object. Dask could definitely be used to process your data in an easy to >> describe manner (creating a dask.bag of dask.arrays would work though I'm >> not sure there are any methods that would buy you from just having a >> standard dictionary of dask.arrays). You can definitely use dask >> imperative to parallelize your data-manipulation algorithms. > > > Indeed, xray's data model is not flexible enough to represent this sort of > data -- it's designed around cases where multiple arrays use shared axes. > > However, I would indeed recommend dask.array (coupled with some sort of > on-disk storage) as a possible solution for this problem, if you need to be > able manipulate these arrays with an API that looks like NumPy. That said, > the fact that your data consists of ragged arrays suggests that the > dask.array API may be less useful for you. > > Tools like dask.imperative, coupled with HDF5 for storage, could still be > very useful, though.
The reason I didn't suggest dask is that I had the impression that dask's model is better suited to bulk/streaming computations with vectorized semantics ("do the same thing to lots of data" kinds of problems, basically), whereas it sounded like the OP's algorithm needed lots of one-off unpredictable random access. Obviously even if this is true then it's useful to point out both because the OP's problem might turn out to be a better fit for dask's model than they indicated -- the post is somewhat vague :-). But, I just wanted to check, is the above a good characterization of dask's strengths/applicability? -n -- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion