See whether this works better:

*There is a lot of room for improvements, (e.g. for your actual dataset 
increase defaultNameCount).*

*PS: Avoid csv package for writing files, use bufio and Write directly... 
csv does some extra stuff, that you probably won't need.*

btw. how many different names do you have?

+ Egon

On Thursday, 1 December 2016 02:51:49 UTC+2, Mandolyte wrote:
> Thanks for the discussion! Package with tester is at:
> While I can't share the data, I could take a sample set of paths for a 
> root node, reverse engineer the pairs, and obfuscate... I've done this sort 
> of thing before but it is a bit of work. So I'll treat that as a last 
> resort. :-)
> On Wednesday, November 30, 2016 at 5:36:41 PM UTC-5, Mandolyte wrote:
>> The finite set idea might work, but the set is well over 300K. The 
>> strings (part numbers) are not regular.
>> I could make a single pass over the "parent" column and record in a 
>> map[string]int the index of the first occurrence. Then I would avoid 
>> sort.Search() having to find it each time. Or use sort.Search() on the 
>> smaller 300K slice if map performance was a problem...
>> There is a lot of repetition in the "branches" - perhaps some caching 
>> would be appropriate...
>> thanks for the ideas!
>> On Wednesday, November 30, 2016 at 9:31:56 AM UTC-5, 
>> wrote:
>>> On Wednesday, 30 November 2016 03:37:55 UTC+2, Mandolyte wrote:
>>>>> I have a fairly large 2D slice of strings, about 115M rows. These are 
>>>>> (parent, child) pairs that, processed recursively, form a tree. I am 
>>>>> "materializing" all possible trees as well as determining for each root 
>>>>> node all child nodes for all levels for it.
>>> In addition to what Egon said:
>>> If the elements of the outer [][]string slice are all []string slices of 
>>> length 2, it would be much more efficient to use [2]string instead, that 
>>> is, fixed-sized arrays of two strings, since this would avoid a lot of 
>>> memory allocation and pointer indirection.  The type of the outer slice 
>>> would become [][2]string.
>>> Also, if all your strings are drawn from a finite set, you'll find many 
>>> operations (such as comparison or set membership tests) much more efficient 
>>> if you represent each string by its index in some side table.  Then instead 
>>> of [2]string you can use [2]uint32, or perhaps even a narrower integer 
>>> type, which will make your main array more compact by at least a factor of 
>>> 4.

You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
For more options, visit

Reply via email to