On 10/27/16 2:40 AM, Era Scarecrow wrote:
On Tuesday, 25 October 2016 at 14:40:17 UTC, Steven Schveighoffer wrote:
I will note, that in addition to the other comments, this is going to
result in corruption. Simply put, the buffer that 'line' uses is
reused for each line. So the string data used inside the associative
array is going to change. This will result in not finding words
already added when using the 'word in dictionary' check.

You need to use dictionary[word.idup] = newId; This will duplicate the
line into a GC string that will live as long as the AA uses it.

 If there's a case where you have immutable data AND can reference it
(say... mmap files?) then referencing the string would work rather than
having to duplicate it.

It depends on the size of the file and the expectation of duplicate words. I'm assuming the number of words is limited, so you are going to allocate far less data by duping on demand. In addition, you may incur penalties for accessing the string directly from the file -- the OS may have swapped out that page and have to re-read it from the file itself.

You could also read the entire file into a string and go based on that.

-Steve

Reply via email to