Thanks. This is soooo embarressing, but I wasn't able to create a new matrix because I forgot to delete the original massive matrix. I was testing how big it could go in terms of rows/columns before reaching the limit and forgot to delete the last object before creating a new one. Sadly that data usage was not reflected in the task manager for the VM instance.
On Tue, Mar 24, 2020, 6:44 PM <numpy-discussion-requ...@python.org> wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-requ...@python.org > > You can reach the person managing the list at > numpy-discussion-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Numpy doesn't use RAM (Sebastian Berg) > 2. Re: Numpy doesn't use RAM (Stanley Seibert) > 3. Re: Numpy doesn't use RAM (Benjamin Root) > 4. Re: Put type annotations in NumPy proper? (Joshua Wilson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 24 Mar 2020 13:15:47 -0500 > From: Sebastian Berg <sebast...@sipsolutions.net> > To: numpy-discussion@python.org > Subject: Re: [Numpy-discussion] Numpy doesn't use RAM > Message-ID: > <fb6d9033ce95ce889c1c256c97581e471d6577bf.ca...@sipsolutions.net> > Content-Type: text/plain; charset="utf-8" > > On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote: > > Hi Numpy dev community, > > > > I'm keyvis, a statistical data scientist. > > > > I'm currently using numpy in python 3.8.2 64-bit for a clustering > > problem, > > on a machine with 1.9 TB RAM. When I try using np.zeros to create a > > 600,000 > > by 600,000 matrix of dtype=np.float32 it says > > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000) > > and > > > > data type float32" > > > > If this error happens, allocating the memory failed. This should be > pretty much a simple `malloc` call in C, so this is the kernel > complaining, not Python/NumPy. > > I am not quite sure, but maybe memory fragmentation plays its part, or > simply are actually out of memory for that process, 1.44TB is a > significant portion of the total memory after all. > > Not sure what to say, but I think you should probably look into other > solutions, maybe using HDF5, zarr, or memory-mapping (although I am not > sure the last actually helps). It will be tricky to work with arrays of > a size that is close to the available total memory. > > Maybe someone who works more with such data here can give you tips on > what projects can help you or what solutions to look into. > > - Sebastian > > > > > I used psutils to determine how much RAM python thinks it has access > > to and > > it return with 1.8 TB approx. > > > > Is there some way I can fix numpy to create these large arrays? > > Thanks for your time and consideration > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: signature.asc > Type: application/pgp-signature > Size: 833 bytes > Desc: This is a digitally signed message part > URL: < > http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/16501583/attachment-0001.sig > > > > ------------------------------ > > Message: 2 > Date: Tue, 24 Mar 2020 13:35:49 -0500 > From: Stanley Seibert <sseib...@anaconda.com> > To: Discussion of Numerical Python <numpy-discussion@python.org> > Subject: Re: [Numpy-discussion] Numpy doesn't use RAM > Message-ID: > < > cadv3rktjbo48a+eyjn7m+gpt2iasd8esaihtzs0vqlnuny_...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > In addition to what Sebastian said about memory fragmentation and OS limits > about memory allocations, I do think it will be hard to work with an array > that close to the memory limit in NumPy regardless. Almost any operation > will need to make a temporary array and exceed your memory limit. You > might want to look at Dask Array for a NumPy-like API for working with > chunked arrays that can be staged in and out of memory: > > https://docs.dask.org/en/latest/array.html > > As a bonus, Dask will also let you make better use of the large number of > CPU cores that you likely have in your 1.9 TB RAM system. :) > > On Tue, Mar 24, 2020 at 1:00 PM Keyvis Damptey <quantkey...@gmail.com> > wrote: > > > Hi Numpy dev community, > > > > I'm keyvis, a statistical data scientist. > > > > I'm currently using numpy in python 3.8.2 64-bit for a clustering > problem, > > on a machine with 1.9 TB RAM. When I try using np.zeros to create a > 600,000 > > by 600,000 matrix of dtype=np.float32 it says > > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000) and > > data type float32" > > > > I used psutils to determine how much RAM python thinks it has access to > > and it return with 1.8 TB approx. > > > > Is there some way I can fix numpy to create these large arrays? > > Thanks for your time and consideration > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/02cbeb71/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Tue, 24 Mar 2020 14:36:45 -0400 > From: Benjamin Root <ben.v.r...@gmail.com> > To: Discussion of Numerical Python <numpy-discussion@python.org> > Subject: Re: [Numpy-discussion] Numpy doesn't use RAM > Message-ID: > <CANNq6Fk2vczBWgPPJmbxmSijViwaR= > cqgvf18pavi3fcbxx...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Another thing to point out about having an array of that percentage of the > available memory is that it severely restricts what you can do with it. > Since you are above 50% of the available memory, you won't be able to > create another array that would be the result of computing something with > that array. So, you are restricted to querying (which you could do without > having everything in-memory), or in-place operations. > > Dask arrays might be what you are really looking for. > > Ben Root > > On Tue, Mar 24, 2020 at 2:18 PM Sebastian Berg <sebast...@sipsolutions.net > > > wrote: > > > On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote: > > > Hi Numpy dev community, > > > > > > I'm keyvis, a statistical data scientist. > > > > > > I'm currently using numpy in python 3.8.2 64-bit for a clustering > > > problem, > > > on a machine with 1.9 TB RAM. When I try using np.zeros to create a > > > 600,000 > > > by 600,000 matrix of dtype=np.float32 it says > > > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000) > > > and > > > > > > data type float32" > > > > > > > If this error happens, allocating the memory failed. This should be > > pretty much a simple `malloc` call in C, so this is the kernel > > complaining, not Python/NumPy. > > > > I am not quite sure, but maybe memory fragmentation plays its part, or > > simply are actually out of memory for that process, 1.44TB is a > > significant portion of the total memory after all. > > > > Not sure what to say, but I think you should probably look into other > > solutions, maybe using HDF5, zarr, or memory-mapping (although I am not > > sure the last actually helps). It will be tricky to work with arrays of > > a size that is close to the available total memory. > > > > Maybe someone who works more with such data here can give you tips on > > what projects can help you or what solutions to look into. > > > > - Sebastian > > > > > > > > > I used psutils to determine how much RAM python thinks it has access > > > to and > > > it return with 1.8 TB approx. > > > > > > Is there some way I can fix numpy to create these large arrays? > > > Thanks for your time and consideration > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/12a718d2/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Tue, 24 Mar 2020 15:42:27 -0700 > From: Joshua Wilson <josh.craig.wil...@gmail.com> > To: Discussion of Numerical Python <numpy-discussion@python.org> > Subject: Re: [Numpy-discussion] Put type annotations in NumPy proper? > Message-ID: > < > cakfgqgwchwdkgnpv2w5xpe7+bxd-hvrgmn6baupg+hgs-4-...@mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > > That is, is this an all-or-nothing thing where as soon as we start, > numpy-stubs becomes unusable? > > Until NumPy is made PEP 561 compatible by adding a `py.typed` file, > type checkers will ignore the types in the repo, so in theory you can > avoid the all or nothing. In practice it's maybe trickier because > currently people can use the stubs, but they won't be able to use the > types in the repo until the PEP 561 switch is flipped. So e.g. > currently SciPy pulls the stubs from `numpy-stubs` master, allowing > for a short > > find place where NumPy stubs are lacking -> improve stubs -> improve SciPy > types > > loop. If all development moves into the main repo then SciPy is > blocked on it becoming PEP 561 compatible before moving forward. But, > you could complain that I put the cart before the horse with > introducing typing in the SciPy repo before the NumPy types were more > resolved, and that's probably a fair complaint. > > > Anyone interested in taking the lead on this? > > Not that I am a core developer or anything, but I am interested in > helping to improve typing in NumPy. > > On Tue, Mar 24, 2020 at 11:15 AM Eric Wieser > <wieser.eric+nu...@gmail.com> wrote: > > > > > Putting > > > aside ndarray, as more challenging, even annotations for numpy > functions > > > and method parameters with built-in types would help, as a start. > > > > This is a good idea in principle, but one thing concerns me. > > > > If we add type annotations to numpy, does it become an error to have > numpy-stubs installed? > > That is, is this an all-or-nothing thing where as soon as we start, > numpy-stubs becomes unusable? > > > > Eric > > > > On Tue, 24 Mar 2020 at 17:28, Roman Yurchak <rth.yurc...@gmail.com> > wrote: > >> > >> Thanks for re-starting this discussion, Stephan! I think there is > >> definitely significant interest in this topic: > >> https://github.com/numpy/numpy/issues/7370 is the issue with the > largest > >> number of user likes in the issue tracker (FWIW). > >> > >> Having them in numpy, as opposed to a separate numpy-stubs repository > >> would indeed be ideal from a user perspective. When looking into it in > >> the past, I was never sure how well in sync numpy-stubs was. Putting > >> aside ndarray, as more challenging, even annotations for numpy functions > >> and method parameters with built-in types would help, as a start. > >> > >> To add to the previously listed projects that would benefit from this, > >> we are currently considering to start using some (minimal) type > >> annotations in scikit-learn. > >> > >> -- > >> Roman Yurchak > >> > >> On 24/03/2020 18:00, Stephan Hoyer wrote: > >> > When we started numpy-stubs [1] a few years ago, putting type > >> > annotations in NumPy itself seemed premature. We still supported > Python > >> > 2, which meant that we would need to use awkward comments for type > >> > annotations. > >> > > >> > Over the past few years, using type annotations has become > increasingly > >> > popular, even in the scientific Python stack. For example, off-hand I > >> > know that at least SciPy, pandas and xarray have at least part of > their > >> > APIs type annotated. Even without annotations for shapes or dtypes, it > >> > would be valuable to have near complete annotations for NumPy, the > >> > project at the bottom of the scientific stack. > >> > > >> > Unfortunately, numpy-stubs never really took off. I can think of a few > >> > reasons for that: > >> > 1. Missing high level guidance on how to write type annotations, > >> > particularly for how (or if) to annotate particularly dynamic parts of > >> > NumPy (e.g., consider __array_function__), and whether we should > >> > prioritize strictness or faithfulness [2]. > >> > 2. We didn't have a good experience for new contributors. Due to the > >> > relatively low level of interest in the project, when a contributor > >> > would occasionally drop in, I often didn't even notice their PR for a > >> > few weeks. > >> > 3. Developing type annotations separately from the main codebase makes > >> > them a little harder to keep in sync. This means that type annotations > >> > couldn't serve their typical purpose of self-documenting code. Part of > >> > this may be necessary for NumPy (due to our use of C extensions), but > >> > large parts of NumPy's user facing APIs are written in Python. We no > >> > longer support Python 2, so at least we no longer need to worry about > >> > putting annotations in comments. > >> > > >> > We eventually could probably use a formal NEP (or several) on how we > >> > want to use type annotations in NumPy, but I think a good first step > >> > would be to think about how to start moving the annotations from > >> > numpy-stubs into numpy proper. > >> > > >> > Any thoughts? Anyone interested in taking the lead on this? > >> > > >> > Cheers, > >> > Stephan > >> > > >> > [1] https://github.com/numpy/numpy-stubs > >> > [2] https://github.com/numpy/numpy-stubs/issues/12 > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion@python.org > >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > > End of NumPy-Discussion Digest, Vol 162, Issue 27 > ************************************************* >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion