Attaching a cleaner patch... Removed a couple of commented lines...


Regards
Prabhu



On Monday 19 September 2011 10:24 PM, Prabhu Gnana Sundar wrote:
Thanks Kamesh for your valuable review...


On Monday 12 September 2011 06:25 PM, Kamesh Jayachandran wrote:

I like this patch. We need it.

Thank you...

I found this bogus mergeinfo's were causing 404 file not found,
which were sometimes causing the merge to fail, making it run longer than necessary.

Few comments.


1.
> opts, args = my_getopt(sys.argv[1:], "h?fRc", ["help", "fix", "recursive", "commit"])

If you do not want to support 'commit' you can remove it as well.

Yeah... I removed it now..


2. I ran it against a working copy of https://ctf.open.collab.net/svn/repos/svnedge/trunk/console
and got the following output

[kamesh@kamesh console]$ python  /tmp/mergeinfo-sanitizer.py .
The mergeinfo has to be sanitized/reverse-merged for the following as:
/branches/CTF_MODE/SvnEdge: 515-766,


[kamesh@kamesh console]$ python  /tmp/mergeinfo-sanitizer.py --fix .
[kamesh@kamesh console]$ svn di
Property changes on: .
___________________________________________________________________
Modified: svn:mergeinfo
   Reverse-merged /branches/CTF_MODE/SvnEdge:r512-515

Based on my analysis on the above repo, I could see /branches/CTF_MODE/SvnEdge:r512-515 is bogus and hence need to be removed. Anywway your script does it correctly with --fix.

I would suggest changing the output of default invocation(i.e no --fix)

Changed the output as suggested...


<snip>
Bogus mergeinfo summary:
Target 1: .
/branches/CTF_MODE/SvnEdge: 512-515,
/merge/source2: X-Y,
..........

Target 2: subtree_path
/merge/source3: A-B
....

Target 3: sub_sub_tree
/merge/source4: C-D
......

Run with --fix to fix this in your working copy and commit yourself.


3. Your script seemed to sanitize to the depth of 'immediates' by default. What is the rational behind that?

No rational behind it, I was using it as default for my own development purpose.
Now fixed it.


4. I ran your script against https://svn.apache.org/repos/asf/subversion/trunk

And got the following output
<snip>
The path /subversion/branches/diff-optimizations/subversion/libsvn_subr/adler32.c is not found The path /subversion/branches/diff-optimizations/subversion/include/private/svn_adler32.h is not found

The mergeinfo has to be sanitized/reverse-merged for the following as:
/subversion/trunk/subversion/libsvn_subr/svn_temp_serializer.c:
/subversion/branches/tree-conflicts/subversion/include/svn_string.h: 872524-873154,868290-872329,
/subversion/trunk/subversion/include/private/svn_temp_serializer.h:
/subversion/branches/diff-optimizations/subversion/libsvn_subr/adler32.c:
/subversion/branches/tree-conflicts/subversion/libsvn_diff/diff.h: 872524-873154,868290-872329,
/subversion/trunk/subversion/libsvn_fs_fs/temp_serializer.h:
/subversion/branches/svn-patch-improvements: 918520-934609,
/subversion/trunk/subversion/libsvn_fs_fs/temp_serializer.c:
/subversion/branches/tree-conflicts/subversion/libsvn_subr/hash.c: 872524-873154,868290-872329, /subversion/branches/diff-optimizations/subversion/include/svn_string.h: 1031270-1037352, /subversion/branches/diff-optimizations-bytes/subversion/include/private/svn_adler32.h: 1054361-1067789, /subversion/branches/svn-patch-improvements/subversion/include/svn_string.h: 918520-934609,
/subversion/branches/diff-optimizations/subversion/libsvn_subr/hash.c: 
1031270-1037352,
/subversion/branches/tree-conflicts: 872524-873154,868290-872329,
/subversion/branches/diff-optimizations/subversion/include/private/svn_adler32.h:
/subversion/branches/diff-optimizations-bytes/subversion/libsvn_subr/adler32.c: 1054361-1067789, /subversion/branches/svn-patch-improvements/subversion/libsvn_diff/diff.h: 918520-934609,
/subversion/branches/diff-optimizations: 1031270-1037352,
/subversion/branches/svn-patch-improvements/subversion/libsvn_subr/hash.c: 918520-934609, /subversion/branches/integrate-cache-item-serialization/subversion/libsvn_fs_fs/temp_serializer.h: 1068738-1068739, /subversion/branches/integrate-cache-item-serialization/subversion/libsvn_subr/svn_temp_serializer.c: 1068723-1068739, /subversion/branches/integrate-cache-item-serialization/subversion/libsvn_fs_fs/temp_serializer.c: 1068738-1068739,
/subversion/trunk/subversion/libsvn_diff/diff.h: 1037352-1054248,
/subversion/trunk/subversion/libsvn_subr/adler32.c: 1054276-1054360,
/subversion/trunk/subversion/include/private/svn_adler32.h: 1054276-1054360, /subversion/branches/integrate-cache-item-serialization/subversion/include/private/svn_temp_serializer.h: 1068723-1068739,
/subversion/trunk/subversion/libsvn_diff/util.c: 1037352-1054251,
</snip>

path not found error need to be handled. error should be on STDERR.

now it is being handled and sent to the stderr...

5. In main() you can call check_local_modifications before get_original_mergeinfo().

That sounds really good though I don't see 'propget' as a costly operation.

6. In get_original_mergeinfo()

> if depth == 3:
Use the named constant than the arbitrary value.

>  if depth == 3:
>    for entry in mergeinfo_catalog:
> pathwise_mergeinfo = pathwise_mergeinfo + mergeinfo_catalog[entry] + "\n"


I think you do something wrong with the above concatenation of mergeinfos of different targets.

>  else:
>    pathwise_mergeinfo = mergeinfo_catalog[wcpath]

>  return core.svn_mergeinfo_parse(pathwise_mergeinfo, temp_pool)

Thanks for pointing this Kamesh... Fixed this in the attached patch...


7. In get_new_location_segments():

>      for revision_range in parsed_original_mergeinfo[path]:
>        try:
>          ra.get_location_segments(ra_session, "", revision_range.end,
> revision_range.end, revision_range.start + 1, receiver)
>        except svn.core.SubversionException:
>          try:
> ra.get_location_segments(ra_session, "", core.SVN_INVALID_REVNUM, > revision_range.end, revision_range.start + 1, receiver)

Not sure why you run location segments against HEAD upon exception.

I was just thinking that the source path could even be present outside the revision-ranges. But looks like I am doing something weird here... So fixed it by removing the part that runs
the location segments against the HEAD...



>          except svn.core.SubversionException:
>            print " The path %s is not found " % path

8. Function parse_args is not used.

Removed it. This was mistakenly left out...


9. Function receiver can have more meaningful name.

Changed the name ...



10. You need to organize your functions in such a way that it appears in some logical order. For example I started my review from 'main()' which is in the middle of the file and I move above and below that function to review further.

Organized :)



Attached the recent patch with this mail... Please share your thoughts...


Thanks and regards
Prabhu

#!/usr/bin/env python
import svn
import sys
import os
import getopt
import hashlib
import pickle
import getpass
import re
from svn import client, core, ra, wc

## This script first fetches the mergeinfo of the working copy and tries
## to fetch the location segments for the source paths in the respective
## revisions present in the mergeinfo. With the obtained location segments
## result, it creates a new mergeinfo. The depth is infinity by default.
## This script would stop proceeding if there are any local modifications in the
## working copy.

try:
  my_getopt = getopt.gnu_getopt
except AttributeError:
  my_getopt = getopt.getopt
mergeinfo = {}

def usage():
  sys.stderr.write(""" Usage: %s WCPATH

Analyze the mergeinfo property of the given WCPATH.
Look for the existence of merge_source's locations at their recorded
merge ranges. If non-existent merge source is found fix the mergeinfo.

Valid Options:
 -f   [--fix]      : set the svn:mergeinfo property. Not committing the changes.
 -h   [--help]     : display the usage

""" % os.path.basename(sys.argv[0]) )


##
# This function would 'svn propset' the new mergeinfo to the working copy
##
def set_new_mergeinfo(wcpath, newmergeinfo, ctx):
  client.propset3("svn:mergeinfo", newmergeinfo, wcpath, core.svn_depth_empty,
                   0, core.SVN_INVALID_REVNUM, None, None, ctx)


##
# Returns the md5 hash of the file
##
def md5_of_file(f, block_size = 2*20):
  md5 = hashlib.md5()
  while True:
    data = f.read(block_size)
    if not data:
      break
    md5.update(data)
  return md5.digest()



def hasher(hash_file, newmergeinfo_file):
  new_mergeinfo =  core.svn_mergeinfo_to_string(mergeinfo)
  with open(newmergeinfo_file, "a") as buffer_file:
    pickle.dump(new_mergeinfo, buffer_file)
  buffer_file.close()

  with open(newmergeinfo_file, "rb") as buffer_file:
    hash_of_buffer_file = md5_of_file(buffer_file)
  buffer_file.close()

  with open(hash_file, "w") as hash_file:
    pickle.dump(hash_of_buffer_file, hash_file)
  hash_file.close()


def location_segment_callback(segment, pool):
  if segment.path is not None:
    source_path = '/' + segment.path
    path_ranges = mergeinfo.get(source_path, [])
    range = svn.core.svn_merge_range_t()
    range.start = segment.range_start - 1
    range.end = segment.range_end
    range.inheritable = 1
    path_ranges.append(range)
    mergeinfo[source_path] = path_ranges

##
# This function does the authentication in an interactive way
##
def prompt_func_ssl_unknown_cert(realm, failures, cert_info, may_save, pool):
  print "The certificate details are as follows:"
  print "--------------------------------------"
  print "Issuer     : " + str(cert_info.issuer_dname)
  print "Hostname   : " + str(cert_info.hostname)
  print "ValidFrom  : " + str(cert_info.valid_from)
  print "ValidUpto  : " + str(cert_info.valid_until)
  print "Fingerprint: " + str(cert_info.fingerprint)
  print ""
  ssl_trust = core.svn_auth_cred_ssl_server_trust_t()
  if may_save:
    choice = raw_input( "accept (t)temporarily   (p)permanently: ")
  else:
    choice = raw_input( "(r)Reject or accept (t)temporarily: ")
  if choice[0] == "t" or choice[0] == "T":
    ssl_trust.may_save = False
    ssl_trust.accepted_failures = failures
  elif choice[0] == "p" or choice[0] == "P":
    ssl_trust.may_save = True
    ssl_trust.accepted_failures = failures
  else:
    ssl_trust = None
  return ssl_trust

def prompt_func_simple_prompt(realm, username, may_save, pool):
  username = raw_input("username: ")
  password = getpass.getpass(prompt="password: ")
  simple_cred = core.svn_auth_cred_simple_t()
  simple_cred.username = username
  simple_cred.password = password
  simple_cred.may_save = False
  return simple_cred

##
# This function tries to authenticate(if needed) and fetch the
# location segments for the available mergeinfo and create a new
# mergeinfo dictionary
##
def get_new_location_segments(parsed_original_mergeinfo, repo_root,
                              wcpath, ctx):

    for path in parsed_original_mergeinfo:
      full_url = repo_root + path
      ra_callbacks = ra.callbacks_t()
      ra_callbacks.auth_baton = core.svn_auth_open([
                                   core.svn_auth_get_ssl_server_trust_file_provider(),
                                   core.svn_auth_get_simple_prompt_provider(prompt_func_simple_prompt, 2),
                                   core.svn_auth_get_ssl_server_trust_prompt_provider(prompt_func_ssl_unknown_cert),
                                   svn.client.get_simple_provider(),
                                   svn.client.get_username_provider()
                                    ])
      try:
        ctx.config = core.svn_config_get_config(None)
        ra_session = ra.open(full_url, ra_callbacks, None, ctx.config)

        for revision_range in parsed_original_mergeinfo[path]:
          try:
            ra.get_location_segments(ra_session, "", revision_range.end,
                                     revision_range.end, revision_range.start + 1, location_segment_callback)
          except svn.core.SubversionException:
            sys.stderr.write(" Could not find location segments for %s \n" % path)
      except Exception, e:
        sys.stderr.write("")


def sanitize_mergeinfo(parsed_original_mergeinfo, repo_root, wcpath,
                       ctx, hash_file, newmergeinfo_file, temp_pool):
  full_mergeinfo = {}
  for entry in parsed_original_mergeinfo:
    get_new_location_segments(parsed_original_mergeinfo[entry], repo_root, wcpath, ctx)
    full_mergeinfo.update(parsed_original_mergeinfo[entry])
    
  hasher(hash_file, newmergeinfo_file)
  diff_mergeinfo = core.svn_mergeinfo_diff(full_mergeinfo,
                                           mergeinfo, 1, temp_pool)
  print "The bogus mergeinfo summary:"
  for pathsource in diff_mergeinfo:
    if pathsource is not None:
      str_diff_info = core.svn_mergeinfo_to_string(pathsource)
      parsed_diff_info = core.svn_mergeinfo_parse(str_diff_info)
      for bogus_mergeinfo_path in parsed_diff_info:
        try:
          if mergeinfo[bogus_mergeinfo_path] is not None:
            sys.stdout.write(bogus_mergeinfo_path + ": ")
        except Exception, e:
          sys.stderr.write("")
        try:
          for revision_range in mergeinfo[bogus_mergeinfo_path]:
            sys.stdout.write(str(revision_range.start + 1) + "-" + str(revision_range.end) + ",")
          print ""
        except Exception, e:
          sys.stderr.write("")

##
# This function tries to 'propset the new mergeinfo into the working copy.
# It reads the new mergeinfo from the .newmergeinfo file and verifies its
# hash against the hash in the .hashfile
##
def fix_sanitized_mergeinfo(parsed_original_mergeinfo, repo_root, wcpath,
                            ctx, hash_file, newmergeinfo_file, temp_pool):
  has_local_modification = check_local_modifications(wcpath, temp_pool)
  old_hash = ''
  new_hash = '' 
  try:
    with open(hash_file, "r") as f:
      old_hash = pickle.load(f)
    f.close
  except IOError, e:
    get_new_location_segments(parsed_original_mergeinfo, repo_root, wcpath, ctx)
    hasher(hash_file, newmergeinfo_file)
    try:
      with open(hash_file, "r") as f:
        old_hash = pickle.load(f)
      f.close
    except IOError:
      hasher(hash_file, newmergeinfo_file)
  try:
    with open(newmergeinfo_file, "r") as f:
      new_hash = md5_of_file(f)
    f.close
  except IOError, e:
    if not mergeinfo:
      get_new_location_segments(parsed_original_mergeinfo, repo_root, wcpath, ctx)
    hasher(hash_file, newmergeinfo_file)
    with open(newmergeinfo_file, "r") as f:
      new_hash = md5_of_file(f)
    f.close
  if old_hash == new_hash:
    with open(newmergeinfo_file, "r") as f:
      newmergeinfo = pickle.load(f)
    f.close
    set_new_mergeinfo(wcpath, newmergeinfo, ctx)
    if os.path.exists(newmergeinfo_file):
      os.remove(newmergeinfo_file)
      os.remove(hash_file)
  else:
    print "The hashes are not matching. Probable chance of unwanted tweaking in the mergeinfo"


##
# This function checks the working copy for any local modifications
##
def check_local_modifications(wcpath, temp_pool):
  has_local_mod = wc.svn_wc_revision_status(wcpath, None, 0, None, temp_pool)
  if has_local_mod.modified:
    print """The working copy has local modifications. Please revert them or clean
the working copy before running the script."""
    sys.exit(1)

def get_original_mergeinfo(wcpath, revision, depth, ctx, temp_pool):
  propget_list = client.svn_client_propget3("svn:mergeinfo", wcpath, 
                                            revision, revision, depth, None,
                                            ctx, temp_pool)

  pathwise_mergeinfo = ""
  pathwise_mergeinfo_list = []
  mergeinfo_catalog = propget_list[0]
  mergeinfo_catalog_dict = {}
  for entry in mergeinfo_catalog:
      mergeinfo_catalog_dict[entry] = core.svn_mergeinfo_parse(mergeinfo_catalog[entry], temp_pool)
  return mergeinfo_catalog_dict


def main():
  try:
    opts, args = my_getopt(sys.argv[1:], "h?f", ["help", "fix"])
  except Exception, e:
    sys.stderr.write(""" Improperly used """)
    sys.exit(1)

  if len(args) == 1:
   wcpath = args[0]
   wcpath = os.path.abspath(wcpath)
  else:
    usage()
    sys.exit(1)

  fix = 0
  current_path = os.getcwd()
  hash_file = os.path.join(current_path, ".hashfile")
  newmergeinfo_file = os.path.join(current_path, ".newmergeinfo")

  temp_pool = core.svn_pool_create()
  ctx = client.svn_client_create_context(temp_pool)
  depth = core.svn_depth_infinity
  revision = core.svn_opt_revision_t()
  revision.kind = core.svn_opt_revision_unspecified

  for opt, values in opts:
    if opt == "--help" or opt in ("-h", "-?"):
      usage()
    elif opt == "--fix" or opt == "-f":
      fix = 1

  # Check for any local modifications in the working copy
  check_local_modifications(wcpath, temp_pool)

  parsed_original_mergeinfo = get_original_mergeinfo(wcpath, revision,
                                                     depth, ctx, temp_pool)

  repo_root = client.svn_client_root_url_from_path(wcpath, ctx, temp_pool)

  core.svn_config_ensure(None)

  if fix == 0:
    sanitize_mergeinfo(parsed_original_mergeinfo, repo_root, wcpath, ctx,
                       hash_file, newmergeinfo_file, temp_pool)
  if fix == 1:
    fix_sanitized_mergeinfo(parsed_original_mergeinfo, repo_root, wcpath,
                            ctx, hash_file, newmergeinfo_file, temp_pool)


if __name__ == "__main__":
  try:
    main()
  except KeyboardInterrupt:
    print ""
    sys.stderr.write("The script is interrupted and stopped manually.")
    print ""

Reply via email to