I'd try storing the data in hdf5 (probably via h5py, which is a more
basic interface without all the bells-and-whistles that pytables
adds), though any method you use is going to be limited by the need to
do a seek before each read. Storing the data on SSD will probably help
a lot if you can afford it for your data size.

On Thu, Jan 14, 2016 at 1:15 AM, Ryan R. Rosario <r...@bytemining.com> wrote:
> Hi,
>
> I have a very large dictionary that must be shared across processes and does 
> not fit in RAM. I need access to this object to be fast. The key is an 
> integer ID and the value is a list containing two elements, both of them 
> numpy arrays (one has ints, the other has floats). The key is sequential, 
> starts at 0, and there are no gaps, so the “outer” layer of this data 
> structure could really just be a list with the key actually being the index. 
> The lengths of each pair of arrays may differ across keys.
>
> For a visual:
>
> {
> key=0:
>         [
>                 numpy.array([1,8,15,…, 16000]),
>                 numpy.array([0.1,0.1,0.1,…,0.1])
>         ],
> key=1:
>         [
>                 numpy.array([5,6]),
>                 numpy.array([0.5,0.5])
>         ],
> …
> }
>
> I’ve tried:
> -       manager proxy objects, but the object was so big that low-level code 
> threw an exception due to format and monkey-patching wasn’t successful.
> -       Redis, which was far too slow due to setting up connections and data 
> conversion etc.
> -       Numpy rec arrays + memory mapping, but there is a restriction that 
> the numpy arrays in each “column” must be of fixed and same size.
> -       I looked at PyTables, which may be a solution, but seems to have a 
> very steep learning curve.
> -       I haven’t tried SQLite3, but I am worried about the time it takes to 
> query the DB for a sequential ID, and then translate byte arrays.
>
> Any ideas? I greatly appreciate any guidance you can provide.
>
> Thanks,
> Ryan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to