[go-nuts] Re: Large 2D slice performance question

Egon Thu, 01 Dec 2016 01:23:17 -0800

See whether this works better: 
https://gist.github.com/egonelbre/d94ea561c3e63db009718e227e506b5b


*There is a lot of room for improvements, (e.g. for your actual dataset 
increase defaultNameCount).*

*PS: Avoid csv package for writing files, use bufio and Write directly... 
csv does some extra stuff, that you probably won't need.*

btw. how many different names do you have?

+ Egon

On Thursday, 1 December 2016 02:51:49 UTC+2, Mandolyte wrote:
>
> Thanks for the discussion! Package with tester is at:
> https://github.com/mandolyte/TableRecursion
>
> While I can't share the data, I could take a sample set of paths for a 
> root node, reverse engineer the pairs, and obfuscate... I've done this sort 
> of thing before but it is a bit of work. So I'll treat that as a last 
> resort. :-)
>
> On Wednesday, November 30, 2016 at 5:36:41 PM UTC-5, Mandolyte wrote:
>>
>> The finite set idea might work, but the set is well over 300K. The 
>> strings (part numbers) are not regular.
>>
>> I could make a single pass over the "parent" column and record in a 
>> map[string]int the index of the first occurrence. Then I would avoid 
>> sort.Search() having to find it each time. Or use sort.Search() on the 
>> smaller 300K slice if map performance was a problem...
>>
>> There is a lot of repetition in the "branches" - perhaps some caching 
>> would be appropriate...
>>
>> thanks for the ideas!
>>
>> On Wednesday, November 30, 2016 at 9:31:56 AM UTC-5, adon...@google.com 
>> wrote:
>>>
>>> On Wednesday, 30 November 2016 03:37:55 UTC+2, Mandolyte wrote:
>>>>>
>>>>> I have a fairly large 2D slice of strings, about 115M rows. These are 
>>>>> (parent, child) pairs that, processed recursively, form a tree. I am 
>>>>> "materializing" all possible trees as well as determining for each root 
>>>>> node all child nodes for all levels for it.
>>>>>
>>>>  
>>> In addition to what Egon said:
>>>
>>> If the elements of the outer [][]string slice are all []string slices of 
>>> length 2, it would be much more efficient to use [2]string instead, that 
>>> is, fixed-sized arrays of two strings, since this would avoid a lot of 
>>> memory allocation and pointer indirection.  The type of the outer slice 
>>> would become [][2]string.
>>>
>>> Also, if all your strings are drawn from a finite set, you'll find many 
>>> operations (such as comparison or set membership tests) much more efficient 
>>> if you represent each string by its index in some side table.  Then instead 
>>> of [2]string you can use [2]uint32, or perhaps even a narrower integer 
>>> type, which will make your main array more compact by at least a factor of 
>>> 4.
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: Large 2D slice performance question

Reply via email to