Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Geir Magnusson Jr. Tue, 15 Nov 2005 14:24:35 -0800


On Nov 15, 2005, at 4:28 PM, Dalibor Topic wrote:

Geir Magnusson Jr. wrote:


On Nov 14, 2005, at 3:18 PM, Dalibor Topic wrote:

On Mon, Nov 14, 2005 at 09:57:48AM -0500, Stefano Mazzocchi wrote:

Leo Simons wrote:

Rant below. Decided not to tone it down.



Leo++


+1 from me, too. sounds like an excellent way to shoot oneself to

slashdot with headlines like "Apache foundation rejects code fromIBM,

claims it was stolen from FSF!". Political suicide, should it ever
happen, as it'd force the ASF to play arbiter in disputes that don't
exist.



I don't understand this.  I'm suggesting we use a tool internally to
help us *find* problems, both at contribution time as well as ongoing

to ensure that inappropriate 3rd party code doesn't come induring the

regular flow of activity.  We'd then examine any issues raised,  and
make a judgement based on that.

OK. I'm uncomfortable delegating such a potentially sensitive issueto a

proprietary black box, as in the worst case that leaves us with little

chance to explore why the black box oracle came up with a wrong orright

analysis.


I'm confused as I don't understand how you are thinking of this.

First, we mention Blackduck as an example of tools that we might usein a specific case of contribution, suggesting that contributors do asimilar thing before contributing if they choose. There is norequirement.

Second, there's no analysis from BD and it's ilk, no "thumbs up" or"thumbs down" - it's simply "these files seem to be like those files"and we humans than go look and judge.


We're not turning over any decision making to anyone.


Checking code pedigree makes sense. It just needs to be transparent.

You get a list of files. You can go check them. Is how thosematches were done significant? Can you tell me the algorithm yourhead uses? :)

Suppose a contribution had code from the FSF. (IBMs doesn't.Period)


Yeah, I didn't mean to imply it had, just as an ugly worst case
scenario. I can come up with an even worse one, actually, in which a
hypothetical IBM contribution had traces of Microsoft's VM code.

Microsoft should be scarier than the FSF to most people on thislist, I

guess, as the FSF has an interest in working together with us, whereas
Microsoft's interests probably aren't aligned with open source J2SE.


I think that's a safe assumption :)

Would you prefer that we don't find it until much later, likeafter arelease? Or if we do find it, just accept it to avoid having tocommit
"political suicide" by pointing it out to the  contributor?
It'd be fine as long as nothing bad is found, or the cases flagged by
the black box oracle are actual issues. I'm trying to view it from a
worst-case perspective.

We can determine them, because the "oracle" is a really fancy grep,which just shows files that have similarity. We then have to verify.


The trouble would start if we end up having a false positive.

How do we figure out that we have a false positive, without either

access to say, the database, the source code of the oracle, thecomplete

legal history of some bit of proprietary code including the merges,
transactions, copyright transfers and relicensing operations, etc?

Ah - yes. That's they key. We would only compare against code thatwe were comfortable having someone look at. Specifically, I'm afraidof Sun code accidentally getting into our codebase, because the stuffis so prevalent in the Java community. It's in every Sun J2SEdistro....

Such a 'discovery' process could take quite a bit of time, providedall

parties involved (including the makers of the black box oracle) would
have any business interest in participating (in absence of an actual

legal case). If, say, Microsoft takes their time to talk to Apacheabout

the legal history of Microsoft's VM, (what'd be in it for Microsoft,
after all? :) where does it leave a contribution that'd be flagged as
potentially infringing on Microsoft's code?

I'd guesstimate a resolution could take a few years, as a worstcase. Is

any contribution that stays in limbo for a few years going to be
relevant after a claim is showed to be false after a few years?

That's where the 'political suicide' scenario I mentioned comes in, as
it could force us to act as an arbiter in determining how trustworthy
either IBM, Black Duck or Microsoft are, based on little more than a
black box. Not a position I'd like to find myself in, in particular if
it all turns out to be just a software glitch.[1] :)

I see. I think that there are some assumptions here that you made,that I wasn't ever thinking of. We need to have the code we compareagainst accessible by someone in the community willing to look atit. We have people that don't care if they glimpse JRL code (and bythe way things are working out, Sun won't care if people are exposedto JRL code as long as they don't make copies...)

So that's the kind of things we want to compare against : open source(kaffe, GNU classpath, etc) and code like Sun's for which there areno limits on retention after exposure.

If we find code stolen from *any* copyright holder, we willdefinitely
reject the code.
+1
Because there is a complete  implementation under a
non-opensource license that has been very, very widelydistributed, it
behooves us to take what steps we can to  ensure that we don't
accidentally incorporate it into our codebase.
+1, too.
We just need to make sure that the steps we take are equallytransparentto everyone involved (and the outsiders), as the rest of theprocess is,in my opinion. A black box oracle doesn't have its place in such aprocess.


Agreed.

cheers,
dalibor topic

[1] Yeah, I know, I'm assuming that the Black Duck software is not
perfect and error free without having ever seen it. It's a worst case
scenario, though, so I am taking some freedoms with things that can go
wrong. :)



Freedom (TM)

:)

geir

--
Geir Magnusson Jr                                  +1-203-665-6437
[EMAIL PROTECTED]

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Reply via email to