I'm considering trying out a alternative IntSet implementation using a 
space-optimized van Emde Boas tree. In that case, the space would scale 
linearly as you add elements, and could give reasonable out-of-memory 
errors should that situation arise. The limits in that case would be the 
limits of the integer type chosen (and an Int128 version probably wouldn't 
take up much more space than an Int32 version storing the same elements). I 
probably won't get to it for a few months, but it'll be interesting to see 
how well it works.

On Friday, 28 February 2014 11:36:23 UTC-6, Jeff Bezanson wrote:
>
> sizeof() is a bit tricky. Currently IntSet is considered not to have a 
> canonical binary representation, so sizeof() defaults to giving the 
> size of the struct. This is a bit sketchy, but it tends to be 
> convenient. 
>
> On Fri, Feb 28, 2014 at 11:49 AM, Ivar Nesje <[email protected]<javascript:>> 
> wrote: 
> > My point was to illustrate that the limit will be arbitrary chosen. It 
> is 
> > impossible to set a limit for how big numbers that should be allowed to 
> be 
> > stored in a IntSet. It depends on the application, and can only be 
> > determined by the programmer. Sorry for not stating that clear. You seem 
> to 
> > understand the issue better than my reasoning :P 
> > 
> > You should probably use a different storage mechanism that explicitly 
> > handles reading and writing the result unused parts of the array to disk 
> if 
> > your IntSet have trouble fitting in memory. Maybe you will get good 
> results 
> > if you just change the backing array in IntSet to a mmapped file? 
> > Unfortunately you have to copy all of the method definitions for IntSet, 
> > because we do not have inheritance from concrete types. 
> > 
> > PS: To get the size of the backing array, you have to use 
> sizeof(s.bits), 
> > sizeof(s) is the constant size of the IntSet struct. 
> > 
> > Regards Ivar 
> > 
> > kl. 16:22:01 UTC+1 fredag 28. februar 2014 skrev David P. Sanders 
> følgende: 
> >> 
> >> 
> >> 
> >> El viernes, 28 de febrero de 2014 08:41:37 UTC-6, Ivar Nesje escribió: 
> >>> 
> >>> The documentation states very clear that IntSet should only be used 
> for 
> >>> dense collections, and that Set, should be used for sparse 
> collections. 
> >> 
> >> 
> >> Agreed. Of course, this was just a toy example to test the limits of 
> >> IntSet. 
> >> In the real application that I am working towards, I want to think 
> about 
> >> systems of size at least 10^5 x 10^5. Mapping pairs (x,y) in this 
> system to 
> >> a single number gives up to 10^10, 
> >> which is what I was testing. 
> >> 
> >> 
> >>> 
> >>> 
> >>>> Construct a sorted set of the integers generated by the given 
> iterable 
> >>>> object, or an empty set. Implemented as a bit string, and therefore 
> designed 
> >>>> for dense integer sets. If the set will be sparse (for example 
> holding a 
> >>>> single very large integer), use Set instead. 
> >>> 
> >>> 
> >>> Do you happen to know a nice limit to how much memory IntSet should be 
> >>> allowed to use? 
> >> 
> >> 
> >> In this case, it seems to be using more memory than I have available on 
> my 
> >> machine (4GB on my laptop). 
> >> I guess my point is that normally I would expect that to give me an 
> >> out-of-memory error, rather than enter an infinite loop producing 
> garbage. 
> >> 
> >> 
> >>> 
> >>> On my laptop 100 MB would be more than I can afford 
> >> 
> >> 
> >> Not sure what you mean by that -- doesn't it rather depend on the 
> >> application? If I am doing a heavy computation on my laptop over night, 
> I am 
> >> happy for it to use all available memory. 
> >> 
> >> 
> >>> 
> >>> , but that would make IntSet unusable for bigger calculations on 
> bigger 
> >>> systems, so it should be no smaller than 10 GB. 
> >> 
> >> 
> >> I have another machine with a lot of memory (128 GB), so I certainly do 
> >> not want to impose an arbitrary restriction. 
> >> 
> >> 
> >> David. 
> >> 
> >>> 
> >>> 
> >>> Ivar 
> >>> 
> >>> kl. 15:16:33 UTC+1 fredag 28. februar 2014 skrev David P. Sanders 
> >>> følgende: 
> >>>> 
> >>>> 
> >>>> I am investigating possible data structures for an application. 
> >>>> Here is an "interesting" behaviour in IntSet, which is no doubt to do 
> >>>> with the implementation. 
> >>>> Maybe it should just throw an exception if someone tries to add a 
> really 
> >>>> large integer like this! 
> >>>> 
> >>>> 
> >>>> julia> s = IntSet() 
> >>>> IntSet() 
> >>>> 
> >>>> julia> push!(s, 100000) 
> >>>> IntSet(100000) 
> >>>> 
> >>>> julia> sizeof(s) 
> >>>> 24 
> >>>> 
> >>>> julia> push!(s, 1000000) 
> >>>> IntSet(100000, 1000000) 
> >>>> 
> >>>> julia> sizeof(s) 
> >>>> 24 
> >>>> 
> >>>> julia> push!(s, 10000000) 
> >>>> IntSet(100000, 1000000, 10000000) 
> >>>> 
> >>>> julia> push!(s, 100000000) 
> >>>> IntSet(100000, 1000000, 10000000, 100000000) 
> >>>> 
> >>>> julia> push!(s, 1000000000) 
> >>>> IntSet(100000, 1000000, 10000000, 100000000, 1000000000) 
> >>>> 
> >>>> julia> sizeof(s) 
> >>>> 24 
> >>>> 
> >>>> julia> push!(s, 10000000000) 
> >>>> IntSet(100000, 1000000, 10000000, 100000000, 1000000000, 1410065408, 
> >>>> 1410065408, 1410065408, 1410065408, 1410065408, 1410065408, 
> 1410065408, 
> >>>> 1410065408, 1410065408, 1410065408, 1410065408, 1410065408, 
> 1410065408, 
> >>>> 1410065408, 1410065408, 1410065408, 1410065408^CEvaluation succeeded, 
> but an 
> >>>> error occurred while showing value of type IntSet: 
> >>>> ERROR: interrupt 
> >>>>  in show at intset.jl:172 
> >>>>  in anonymous at show.jl:973 
> >>>>  in showlimited at show.jl:972 
> >>>>  in writemime at repl.jl:2 
> >>>>  in display at multimedia.jl:117 
> >>>>  in display at multimedia.jl:119 
> >>>>  in display at multimedia.jl:151 
>

Reply via email to