Re: Don't include buildroot in list of duplicate files printed

2013-08-26 Thread Jeffrey Johnson

On Aug 26, 2013, at 2:24 PM, Per Øyvind Karlsen wrote:

 This patch will strip away the buildroot prefix for duplicate files listed, 
 providing greater consistency with behaviour otherwise.
 

While the approach is sensible, the deeper flaw is that duplicate
file checks added by Alexey Tourbin around the time of rpm-5.1.9
release do not scale.

Far deeper changes than cosmetically stripping a builroot path are
needed, likely with Bloom filters attached to all the binary packages
produced in a build, so that

/**
 * Return intersection of two Bloom filters.
 * @retval aBloom filter
 * @param b Bloom filter
 * @return  0 on success, -1 if {m,k} disagree or NULL pointers.
 */
int rpmbfIntersect(rpmbf a, const rpmbf b)
/*@modifies a @*/;

can be used to detect duplicate (and similarly shared/conflicting) files.

Build a kernel package (which has many more paths than most packages)
and time the additional checks added by Alexey Tourbin if you wish
to see the scaling problem in the existing implementation.

Or try building tests/millionfile-insanity.spec to measure the cost of
the added checks.

Note that the check sadded by Alexey are useful: my only objection is
that the implementation did not (and does not) scale with lots of files.

73 de Jeff

 --
 Regards,
 Per Øyvind
 rpm-5.4.9-strip-buildroot-away-from-duplicate-files-list.patch

__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: Don't include buildroot in list of duplicate files printed

2013-08-26 Thread Per Øyvind Karlsen
2013/8/26 Jeffrey Johnson n3...@me.com


 On Aug 26, 2013, at 2:24 PM, Per Øyvind Karlsen wrote:

  This patch will strip away the buildroot prefix for duplicate files
 listed, providing greater consistency with behaviour otherwise.
 

 While the approach is sensible, the deeper flaw is that duplicate
 file checks added by Alexey Tourbin around the time of rpm-5.1.9
 release do not scale.

 Far deeper changes than cosmetically stripping a builroot path are
 needed, likely with Bloom filters attached to all the binary packages
 produced in a build, so that

 /**
  * Return intersection of two Bloom filters.
  * @retval aBloom filter
  * @param b Bloom filter
  * @return  0 on success, -1 if {m,k} disagree or NULL
 pointers.
  */
 int rpmbfIntersect(rpmbf a, const rpmbf b)
 /*@modifies a @*/;

 can be used to detect duplicate (and similarly shared/conflicting) files.

 Build a kernel package (which has many more paths than most packages)
 and time the additional checks added by Alexey Tourbin if you wish
 to see the scaling problem in the existing implementation.

 Or try building tests/millionfile-insanity.spec to measure the cost of
 the added checks.

 Note that the check sadded by Alexey are useful: my only objection is
 that the implementation did not (and does not) scale with lots of files.

I can't remember whether I've even actually run into any problems related
to this myself, but on a related note, the unpackage sub directory check is
very often giving false positives..
I have a patch to add support termination of build on unpackaged sub
directories, but considering that the check isn't reliable, I've left it
disabled by default, but I'll attach the patch none the less.

--
Regards,
Per Øyvind


rpm-5.4.10-unpackaged_subdirs_terminate_build.patch
Description: Binary data


Re: Don't include buildroot in list of duplicate files printed

2013-08-26 Thread Jeffrey Johnson
I'm unconvinced that directories need to be treated differently than files.

There has been
%unpackaged_files_terminate_build
since forever. I still believe that the macros need to be removed
and the behavior needs to be made MANDATORY (if desired).

Alexey Tourbin also implemented some silent fixes to add
subdirs that were not specified, which is why you found it
more convenient to add a new macro for directories rarther
than files.

The core issue in need of resolving is when directories should
(or should not) be added to packaging.

I believe that all directories mentioned on every path (including /)
should be added to every package because that is the logical
end-point if _ANY_ directory is added to packaging. One can
quite easily design a packaging system that _NEVER_ includes any
directory, but rather creates directories as needed when mentioned
in file paths, and manages only files not directories, setting directory
permissions as side effects and lazily removing empty directories.

Meanwhile RPM permits both only files and every subdir component
without a clear consensus on what the Right Thing To Do actually is.

OTOH, I personally think that ignoring any/every file not mentioned
explicitly (or implicitly with glob patterns), with an explicit warning
listing all the unpackaged files, is better behavior. Its certainly useful
to cut-n-paste a bunch of paths into a %files manifest and then
adjust to whatever level of macro madness one wishes, a task
I do all the time when creating packages.

All of the rather pugly macro enablers/disablers was Red Hat
control phreak manglement ordered because a FULLSTOP
build failure is all that most package monkeys understand
about proper packaging.

YMMV and my personal packaging checks are clearly different than yours
if we disagree on mileage.


On Aug 26, 2013, at 3:12 PM, Per Øyvind Karlsen wrote:

 2013/8/26 Jeffrey Johnson n3...@me.com
 
 On Aug 26, 2013, at 2:24 PM, Per Øyvind Karlsen wrote:
 
  This patch will strip away the buildroot prefix for duplicate files listed, 
  providing greater consistency with behaviour otherwise.
 
 
 While the approach is sensible, the deeper flaw is that duplicate
 file checks added by Alexey Tourbin around the time of rpm-5.1.9
 release do not scale.
 
 Far deeper changes than cosmetically stripping a builroot path are
 needed, likely with Bloom filters attached to all the binary packages
 produced in a build, so that
 
 /**
  * Return intersection of two Bloom filters.
  * @retval aBloom filter
  * @param b Bloom filter
  * @return  0 on success, -1 if {m,k} disagree or NULL pointers.
  */
 int rpmbfIntersect(rpmbf a, const rpmbf b)
 /*@modifies a @*/;
 
 can be used to detect duplicate (and similarly shared/conflicting) files.
 
 Build a kernel package (which has many more paths than most packages)
 and time the additional checks added by Alexey Tourbin if you wish
 to see the scaling problem in the existing implementation.
 
 Or try building tests/millionfile-insanity.spec to measure the cost of
 the added checks.
 
 Note that the check sadded by Alexey are useful: my only objection is
 that the implementation did not (and does not) scale with lots of files.
 I can't remember whether I've even actually run into any problems related to 
 this myself, but on a related note, the unpackage sub directory check is very 
 often giving false positives..
 I have a patch to add support termination of build on unpackaged sub 
 directories, but considering that the check isn't reliable, I've left it 
 disabled by default, but I'll attach the patch none the less.
 
 --
 Regards,
 Per Øyvind
 rpm-5.4.10-unpackaged_subdirs_terminate_build.patch