Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-08 Thread Magnus Thor Torfason
Thanks to Thomas, Martin, Jim and William, Your input was very informative, and thanks for the reference to Sedgwick. In the end, it does seem to me that all these algorithms require fast lookup by ID of nodes to access data, and that conditional on such fast lookup, algorithms are possible

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-04 Thread Magnus Thor Torfason
There are around 16M unique values. After accounting for equivalence, the number is much smaller (I don't know how much smaller, since my program has not completed yet :-) Yes, I meant that B and C are also equivalent. The original version was a typo. Best, Magnus On 11/1/2013 3:45 PM, jim

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-04 Thread Magnus Thor Torfason
On 11/1/2013 10:12 PM, Martin Morgan wrote: Do you mean that if A,B occur together and B,C occur together, then A,B and A,C are equivalent? Yes, that's what I meant, sorry, typo. I like your uid() function. It avoids the 20M times loop, and the issue of circular references can be solved by

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-04 Thread Magnus Thor Torfason
-project.org] On Behalf Of Magnus Thor Torfason Sent: Friday, November 01, 2013 8:23 AM To: r-help@r-project.org Subject: Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days Sure, I was attempting to be concise and boiling it down to what I saw as the root issue

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-04 Thread Thomas Lumley
On Sat, Nov 2, 2013 at 11:12 AM, Martin Morgan mtmor...@fhcrc.org wrote: On 11/01/2013 08:22 AM, Magnus Thor Torfason wrote: Sure, I was attempting to be concise and boiling it down to what I saw as the root issue, but you are right, I could have taken it a step further. So here goes. I

[R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread Magnus Thor Torfason
Pretty much what the subject says: I used an env as the basis for a Hashtable in R, based on information that this is in fact the way environments are implemented under the hood. I've been experimenting with doubling the number of entries, and so far it has seemed to be scaling more or less

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread jim holtman
It would be nice if you followed the posting guidelines and at least showed the script that was creating your entries now so that we understand the problem you are trying to solve. A bit more explanation of why you want this would be useful. This gets to the second part of my tag line: Tell me

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread Magnus Thor Torfason
Sure, I was attempting to be concise and boiling it down to what I saw as the root issue, but you are right, I could have taken it a step further. So here goes. I have a set of around around 20M string pairs. A given string (say, A) can either be equivalent to another string (B) or not. If

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread William Dunlap
-project.org Subject: Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days Sure, I was attempting to be concise and boiling it down to what I saw as the root issue, but you are right, I could have taken it a step further. So here goes. I have a set

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread Martin Morgan
On 11/01/2013 08:22 AM, Magnus Thor Torfason wrote: Sure, I was attempting to be concise and boiling it down to what I saw as the root issue, but you are right, I could have taken it a step further. So here goes. I have a set of around around 20M string pairs. A given string (say, A) can