See whether this works better:
*There is a lot of room for improvements, (e.g. for your actual dataset
*PS: Avoid csv package for writing files, use bufio and Write directly...
csv does some extra stuff, that you probably won't need.*
btw. how many different names do you have?
On Thursday, 1 December 2016 02:51:49 UTC+2, Mandolyte wrote:
> Thanks for the discussion! Package with tester is at:
> While I can't share the data, I could take a sample set of paths for a
> root node, reverse engineer the pairs, and obfuscate... I've done this sort
> of thing before but it is a bit of work. So I'll treat that as a last
> resort. :-)
> On Wednesday, November 30, 2016 at 5:36:41 PM UTC-5, Mandolyte wrote:
>> The finite set idea might work, but the set is well over 300K. The
>> strings (part numbers) are not regular.
>> I could make a single pass over the "parent" column and record in a
>> map[string]int the index of the first occurrence. Then I would avoid
>> sort.Search() having to find it each time. Or use sort.Search() on the
>> smaller 300K slice if map performance was a problem...
>> There is a lot of repetition in the "branches" - perhaps some caching
>> would be appropriate...
>> thanks for the ideas!
>> On Wednesday, November 30, 2016 at 9:31:56 AM UTC-5, adon...@google.com
>>> On Wednesday, 30 November 2016 03:37:55 UTC+2, Mandolyte wrote:
>>>>> I have a fairly large 2D slice of strings, about 115M rows. These are
>>>>> (parent, child) pairs that, processed recursively, form a tree. I am
>>>>> "materializing" all possible trees as well as determining for each root
>>>>> node all child nodes for all levels for it.
>>> In addition to what Egon said:
>>> If the elements of the outer string slice are all string slices of
>>> length 2, it would be much more efficient to use string instead, that
>>> is, fixed-sized arrays of two strings, since this would avoid a lot of
>>> memory allocation and pointer indirection. The type of the outer slice
>>> would become string.
>>> Also, if all your strings are drawn from a finite set, you'll find many
>>> operations (such as comparison or set membership tests) much more efficient
>>> if you represent each string by its index in some side table. Then instead
>>> of string you can use uint32, or perhaps even a narrower integer
>>> type, which will make your main array more compact by at least a factor of
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.