On Thu, Jan 14, 2016 at 2:13 PM, Stephan Hoyer <sho...@gmail.com> wrote:
> On Thu, Jan 14, 2016 at 8:26 AM, Travis Oliphant <tra...@continuum.io>
> wrote:
>>
>> I don't know enough about xray to know whether it supports this kind of
>> general labeling to be able to build your entire data-structure as an x-ray
>> object.   Dask could definitely be used to process your data in an easy to
>> describe manner (creating a dask.bag of dask.arrays would work though I'm
>> not sure there are any methods that would buy you from just having a
>> standard dictionary of dask.arrays).   You can definitely use dask
>> imperative to parallelize your data-manipulation algorithms.
>
>
> Indeed, xray's data model is not flexible enough to represent this sort of
> data -- it's designed around cases where multiple arrays use shared axes.
>
> However, I would indeed recommend dask.array (coupled with some sort of
> on-disk storage) as a possible solution for this problem, if you need to be
> able manipulate these arrays with an API that looks like NumPy. That said,
> the fact that your data consists of ragged arrays suggests that the
> dask.array API may be less useful for you.
>
> Tools like dask.imperative, coupled with HDF5 for storage, could still be
> very useful, though.

The reason I didn't suggest dask is that I had the impression that
dask's model is better suited to bulk/streaming computations with
vectorized semantics ("do the same thing to lots of data" kinds of
problems, basically), whereas it sounded like the OP's algorithm
needed lots of one-off unpredictable random access.

Obviously even if this is true then it's useful to point out both
because the OP's problem might turn out to be a better fit for dask's
model than they indicated -- the post is somewhat vague :-).

But, I just wanted to check, is the above a good characterization of
dask's strengths/applicability?

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to