On Nov 15, 2005, at 8:20 PM, Dalibor Topic wrote:

Geir Magnusson Jr. wrote:

You get a list of files. You can go check them. Is how those matches were done significant? Can you tell me the algorithm your head uses? :)


Well, we could simply throw a dice, if how the matches were done is not
significant. :)

If the algorithm is bad, we don't find anything or lots of false positives. In either case we stop using the tool.

If the algorithm is good, we know what it would find if there's anything to find, so we can use it.

In either case, we can (I believe) do a reasonable job of figuring it out even if you don't have the source.

(I never had the source to Windows NT, but I knew it wasn't "good")



Ah - yes. That's they key. We would only compare against code that we were comfortable having someone look at. Specifically, I'm afraid of Sun code accidentally getting into our codebase, because the stuff is so prevalent in the Java community. It's in every Sun J2SE distro....


That should be easy enough: just grep for "confidential J2SE software
from Sun, play nice and play fair!", or whatever the copyright headers
on such Sun software say.

Stuff beyound that would probably go beyound uncovering simple
accidents, and would require quite a bit of cooperation from Sun to
disclose the pedigree of their implementation's code and equivalent
copperation from the contributors.


Why?  The source is available under the JRL.

Let me give you another scenario:

Purely hypothetically speaking, Sun's implementation may include third
party software, and changes to such software. Contributions from others
may include the same (open source, for the sake of argument) software
and similar changes in order to meet specific, common goals defined by
common specs.


For example, Sun's code includes apache code :)

How do we determine for sure who wrote what when, and who copied what
from whom, if that was OK then, and if the contributor has the right to contribute his changes? In case of conflicting opinions, what do we do?
 Or even worse, if code comes from a now defunct and dead open source
project from 1997 [1], with noone around any more, the web site and
archives wiped out, what do we do? :)


I presume that the answer isn't "stick head in sand".

Let me ask you this - if the above software did exist and it made it into Harmony's SVN, would you prefer that

a) we knew about it and could explain the decision to include it

b) We were surprised at some future date


I guess the point I'm trying to get across is that the best we can do
with our resources are very simple, almost trivial checks like checking
if the copyright headers are sane.

I believe we can do better than that.


Anything beyond that gets very, very joyously complicated very quickly,
without permanent active assistance from everyone, including the
copyright holders of the proprietary implementations. Whether copyright
holders of proprietary implementations would be pleased to dedicate
resources for Harmony's potential regular inquiries about their code's
pedigree, I don't know. I'm not sure a pull model scales well in this
case. :)

I agree that it *can* get very complicated in the hypotheticals, but I'd bet that the majority of what we'd find - if we'd even find anything at all - would be due simple misunderstandings and mistakes. I'd sleep better knowing that we at least tried. One of our best defenses in the event something went wrong would be a demonstrable, good faith effort to do reasonable oversight.

geir


cheers,
dalibor topic

[1] Ueber-hypothetically, a BSD-ish licensed fork of Kaffe from back
then. It used to have a BSD-ish license, back in the days, and it got
forked quite a bit, afaict from the mailing list archives. All of that
was before my days, and quite a few of those forks are ... resting.:)


--
Geir Magnusson Jr                                  +1-203-665-6437
[EMAIL PROTECTED]


Reply via email to