What should the contract of asFormatString() be? sounds like it
intends to output the vector in an ordered way?

I do agree, this doesn't imply that it has to be stored internally in
an ordered way.
LinkedHashMap just preserved the insertion order which might not be
ordered by key.


This is sort unrelated, but on this subject, looking through the code
I think there are some opportunities to cut down on runtime,
boxing/unboxing (i.e. object allocation), and memory usage. The code
does a good job of relying on standard APIs and language features to
make coding easy and clear, though in some cases what is really going
on is a bit inefficient.

Example: dot(). We should iterate over the vector with fewer entries.
We should iterate over entries instead of keys since a) HashMap
creates a new data structure when you want to iterate over keys, out
of the underlying entry data structure, and b) it saves a map lookup.

It is pretty early to think about optimization so perhaps I should not
bring this up. But I love to tinker with stuff like this... :)


Perhaps I am paranoid from working on my own library where I have
optimized and optimized and written custom data structures and *still*
somehow it's too slow and easy to run out of memory on a couple
million data points. Mahout is going to be all about scalability so I
bet we will have to do some work and maybe rethinking of
representations here and there to take this from "works" to "works at
scale". But, first, we must get it to work of course, that's the right
focus.

On Fri, Aug 8, 2008 at 3:11 AM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> If there is never a need to keep it in order except for the unit test, then
> yes, I agree with you. In that case, instead of using sort in the
> asFormatString, I'd rather have the unit test parse the string and then test
> it.
>
> On Fri, Aug 8, 2008 at 12:28 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>> Remember, the loop in question is for testing only.
>>
>> On Thu, Aug 7, 2008 at 11:37 PM, Shalin Shekhar Mangar <
>> [EMAIL PROTECTED]> wrote:
>>
>> > It maintains doubly linked list through the Map.Entry objects. Additional
>> > memory will be used to keep two(?) additional references per entry. The
>> > cost
>> > in terms of asymptotic behavior is the same. However, iteration is faster
>> > because it depends on the size whereas with HashMap, iteration is
>> > proportional to it's capacity.
>> >
>> > In practice, it is almost as fast and should not be a problem.
>> >
>> > On Fri, Aug 8, 2008 at 11:56 AM, Ted Dunning <[EMAIL PROTECTED]>
>> > wrote:
>> >
>> > > How much memory overhead does the linked hash map add?  How much speed
>> > > cost?
>> > >
>> > > That would solve the problem by making the order stable, but we
>> shouldn't
>> > > slow down the code just to make the test easier to write, especially
>> when
>> > > the additional code in the test is < 1 line of code.
>> > >
>> > > On Thu, Aug 7, 2008 at 11:12 PM, Shalin Shekhar Mangar <
>> > > [EMAIL PROTECTED]> wrote:
>> > >
>> > > > Or use a LinkedHashMap?
>> > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>> >
>>
>>
>>
>> --
>> ted
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Reply via email to