I'd try storing the data in hdf5 (probably via h5py, which is a more
basic interface without all the bells-and-whistles that pytables
adds), though any method you use is going to be limited by the need to
do a seek before each read. Storing the data on SSD will probably help
a lot if you can
Hi,
I have a very large dictionary that must be shared across processes and does
not fit in RAM. I need access to this object to be fast. The key is an integer
ID and the value is a list containing two elements, both of them numpy arrays
(one has ints, the other has floats). The key is
Well, maybe something like a simple class emulating a dictionary that
stores a key-value on disk would be more than enough. Then you can use
whatever persistence layer that you want (even HDF5, but not necessarily).
As a demonstration I did a quick and dirty implementation for such a
persistent
>From what I know this would be the use case that Dask seems to solve.
I think this blog post can help:
https://www.continuum.io/content/xray-dask-out-core-labeled-arrays-python
Notice that I haven't used any of these projects myself.
On Thu, Jan 14, 2016 at 11:48 AM, Francesc Alted
On Thu, Jan 14, 2016 at 8:16 AM, Edison Gustavo Muenz <
edisongust...@gmail.com> wrote:
> From what I know this would be the use case that Dask seems to solve.
>
> I think this blog post can help:
> https://www.continuum.io/content/xray-dask-out-core-labeled-arrays-python
>
> Notice that I
On Thu, Jan 14, 2016 at 2:30 PM, Nathaniel Smith wrote:
> The reason I didn't suggest dask is that I had the impression that
> dask's model is better suited to bulk/streaming computations with
> vectorized semantics ("do the same thing to lots of data" kinds of
> problems,
On Thu, Jan 14, 2016 at 2:13 PM, Stephan Hoyer wrote:
> On Thu, Jan 14, 2016 at 8:26 AM, Travis Oliphant
> wrote:
>>
>> I don't know enough about xray to know whether it supports this kind of
>> general labeling to be able to build your entire
Hi Ryan,
Did you consider packing the arrays into one(two) giant array stored with mmap?
That way you only need to store the start & end offsets, and there is
no need to use a dictionary.
It may allow you to simplify some numerical operations as well.
To be more specific,
start : numpy.intp
A warning about HDF5. It is not a database format, so you have to be
extremely careful if the data is getting updated while it is open for
reading by anybody else. If it is strictly read-only, and no body else is
updating it, then have at it!
Cheers!
Ben Root
On Thu, Jan 14, 2016 at 9:16 AM,
On Thu, Jan 14, 2016 at 8:26 AM, Travis Oliphant
wrote:
> I don't know enough about xray to know whether it supports this kind of
> general labeling to be able to build your entire data-structure as an x-ray
> object. Dask could definitely be used to process your data in
10 matches
Mail list logo