On Nov 15, 2005, at 4:28 PM, Dalibor Topic wrote:
Geir Magnusson Jr. wrote:
On Nov 14, 2005, at 3:18 PM, Dalibor Topic wrote:
On Mon, Nov 14, 2005 at 09:57:48AM -0500, Stefano Mazzocchi wrote:
Leo Simons wrote:
Rant below. Decided not to tone it down.
Leo++
+1 from me, too. sounds like an excellent way to shoot oneself to
slashdot with headlines like "Apache foundation rejects code from
IBM,
claims it was stolen from FSF!". Political suicide, should it ever
happen, as it'd force the ASF to play arbiter in disputes that don't
exist.
I don't understand this. I'm suggesting we use a tool internally to
help us *find* problems, both at contribution time as well as ongoing
to ensure that inappropriate 3rd party code doesn't come in
during the
regular flow of activity. We'd then examine any issues raised, and
make a judgement based on that.
OK. I'm uncomfortable delegating such a potentially sensitive issue
to a
proprietary black box, as in the worst case that leaves us with little
chance to explore why the black box oracle came up with a wrong or
right
analysis.
I'm confused as I don't understand how you are thinking of this.
First, we mention Blackduck as an example of tools that we might use
in a specific case of contribution, suggesting that contributors do a
similar thing before contributing if they choose. There is no
requirement.
Second, there's no analysis from BD and it's ilk, no "thumbs up" or
"thumbs down" - it's simply "these files seem to be like those files"
and we humans than go look and judge.
We're not turning over any decision making to anyone.
Checking code pedigree makes sense. It just needs to be transparent.
You get a list of files. You can go check them. Is how those
matches were done significant? Can you tell me the algorithm your
head uses? :)
Suppose a contribution had code from the FSF. (IBMs doesn't.
Period)
Yeah, I didn't mean to imply it had, just as an ugly worst case
scenario. I can come up with an even worse one, actually, in which a
hypothetical IBM contribution had traces of Microsoft's VM code.
Microsoft should be scarier than the FSF to most people on this
list, I
guess, as the FSF has an interest in working together with us, whereas
Microsoft's interests probably aren't aligned with open source J2SE.
I think that's a safe assumption :)
Would you prefer that we don't find it until much later, like
after a
release? Or if we do find it, just accept it to avoid having to
commit
"political suicide" by pointing it out to the contributor?
It'd be fine as long as nothing bad is found, or the cases flagged by
the black box oracle are actual issues. I'm trying to view it from a
worst-case perspective.
We can determine them, because the "oracle" is a really fancy grep,
which just shows files that have similarity. We then have to verify.
The trouble would start if we end up having a false positive.
How do we figure out that we have a false positive, without either
access to say, the database, the source code of the oracle, the
complete
legal history of some bit of proprietary code including the merges,
transactions, copyright transfers and relicensing operations, etc?
Ah - yes. That's they key. We would only compare against code that
we were comfortable having someone look at. Specifically, I'm afraid
of Sun code accidentally getting into our codebase, because the stuff
is so prevalent in the Java community. It's in every Sun J2SE
distro....
Such a 'discovery' process could take quite a bit of time, provided
all
parties involved (including the makers of the black box oracle) would
have any business interest in participating (in absence of an actual
legal case). If, say, Microsoft takes their time to talk to Apache
about
the legal history of Microsoft's VM, (what'd be in it for Microsoft,
after all? :) where does it leave a contribution that'd be flagged as
potentially infringing on Microsoft's code?
I'd guesstimate a resolution could take a few years, as a worst
case. Is
any contribution that stays in limbo for a few years going to be
relevant after a claim is showed to be false after a few years?
That's where the 'political suicide' scenario I mentioned comes in, as
it could force us to act as an arbiter in determining how trustworthy
either IBM, Black Duck or Microsoft are, based on little more than a
black box. Not a position I'd like to find myself in, in particular if
it all turns out to be just a software glitch.[1] :)
I see. I think that there are some assumptions here that you made,
that I wasn't ever thinking of. We need to have the code we compare
against accessible by someone in the community willing to look at
it. We have people that don't care if they glimpse JRL code (and by
the way things are working out, Sun won't care if people are exposed
to JRL code as long as they don't make copies...)
So that's the kind of things we want to compare against : open source
(kaffe, GNU classpath, etc) and code like Sun's for which there are
no limits on retention after exposure.
If we find code stolen from *any* copyright holder, we will
definitely
reject the code.
+1
Because there is a complete implementation under a
non-opensource license that has been very, very widely
distributed, it
behooves us to take what steps we can to ensure that we don't
accidentally incorporate it into our codebase.
+1, too.
We just need to make sure that the steps we take are equally
transparent
to everyone involved (and the outsiders), as the rest of the
process is,
in my opinion. A black box oracle doesn't have its place in such a
process.
Agreed.
cheers,
dalibor topic
[1] Yeah, I know, I'm assuming that the Black Duck software is not
perfect and error free without having ever seen it. It's a worst case
scenario, though, so I am taking some freedoms with things that can go
wrong. :)
Freedom (TM)
:)
geir
--
Geir Magnusson Jr +1-203-665-6437
[EMAIL PROTECTED]