Liming, One Pep8 thing. Can you change to use the with statement for the file read/write?
Other small thoughts. I think that FileList should be changed to a set as order is not important. Maybe wrapper the re.sub function with your own so all the .encode() are in one location? As we move to python 3 we will have fewer changes to make. > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Liming Gao > Sent: Sunday, May 20, 2018 9:52 PM > To: edk2-devel@lists.01.org > Subject: [edk2] [RFC] Formalize source files to follow DOS format > > FormatDosFiles.py is added to clean up dos source files. It bases on > the rules defined in EDKII C Coding Standards Specification. > 5.1.2 Do not use tab characters > 5.1.6 Only use CRLF (Carriage Return Line Feed) line endings. > 5.1.7 All files must end with CRLF > No trailing white space in one line. (To be added in spec) > > The source files in edk2 project with the below postfix are dos format. > .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf > .txt .bat .py > > The package maintainer can use this script to clean up all files in his > package. The prefer way is to create one patch per one package. > > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Liming Gao <liming....@intel.com> > --- > BaseTools/Scripts/FormatDosFiles.py | 93 > +++++++++++++++++++++++++++++++++++++ > 1 file changed, 93 insertions(+) > create mode 100644 BaseTools/Scripts/FormatDosFiles.py > > diff --git a/BaseTools/Scripts/FormatDosFiles.py > b/BaseTools/Scripts/FormatDosFiles.py > new file mode 100644 > index 0000000..c3a5476 > --- /dev/null > +++ b/BaseTools/Scripts/FormatDosFiles.py > @@ -0,0 +1,93 @@ > +# @file FormatDosFiles.py > +# This script format the source files to follow dos style. > +# It supports Python2.x and Python3.x both. > +# > +# Copyright (c) 2018, Intel Corporation. All rights reserved.<BR> > +# > +# This program and the accompanying materials > +# are licensed and made available under the terms and conditions of the > BSD License > +# which accompanies this distribution. The full text of the license may be > found at > +# http://opensource.org/licenses/bsd-license.php > +# > +# THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" > BASIS, > +# WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER > EXPRESS OR IMPLIED. > +# > + > +# > +# Import Modules > +# > +import argparse > +import os > +import os.path > +import re > +import sys > + > +""" > +difference of string between python2 and python3: > + > +there is a large difference of string in python2 and python3. > + > +in python2,there are two type string,unicode string (unicode type) and 8-bit > string (str type). > + us = u"abcd", > + unicode string,which is internally stored as unicode code point. > + s = "abcd",s = b"abcd",s = r"abcd", > + all of them are 8-bit string,which is internally stored as bytes. > + > +in python3,a new type called bytes replace 8-bit string,and str type is > regarded as unicode string. > + s = "abcd", s = u"abcd", s = r"abcd", > + all of them are str type,which is internally stored unicode code point. > + bs = b"abcd", > + bytes type,which is interally stored as bytes > + > +in python2 ,the both type string can be mixed use,but in python3 it could > not, > +which means the pattern and content in re match should be the same type > in python3. > +in function FormatFile,it read file in binary mode so that the content is > bytes > type,so the pattern should also be bytes type. > +As a result,I add encode() to make it compitable among python2 and > python3. > + > +difference of encode,decode in python2 and python3: > +the builtin function str.encode(encoding) and str.decode(encoding) are > used for convert between 8-bit string and unicode string. > + > +in python2 > + encode convert unicode type to str type.decode vice versa.default > encoding is ascii. > + for example: s = us.encode() > + but if the us is str type,the code will also work.it will be firstly > convert > to unicode type, > + in this situation,the call equals s = us.decode().encode(). > + > +in python3 > + encode convert str type to bytes type,decode vice versa.default > encoding is utf8. > + fpr example: > + bs = s.encode(),only str type has encode method,so that won't be > used wrongly.decode is the same. > + > +in conclusion: > + this code could work the same in python27 and python36 > environment as far as the re pattern satisfy ascii character set. > + > +""" > +def FormatFiles(): > + parser = argparse.ArgumentParser() > + parser.add_argument('path', nargs=1, help='The path for files to be > converted.') > + parser.add_argument('extensions', nargs='+', help='File extensions > filter. > (Example: .txt .c .h)') > + args = parser.parse_args() > + filelist = [] > + for dirpath, dirnames, filenames in os.walk(args.path[0]): > + for filename in [f for f in filenames if any(f.endswith(ext) for ext > in > args.extensions)]: > + filelist.append(os.path.join(dirpath, filename)) > + for file in filelist: > + fd = open(file, 'rb') > + content = fd.read() > + fd.close() > + # Convert the line endings to CRLF > + content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content) > + content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags = > re.MULTILINE) > + # Add a new empty line if the file is not end with one > + content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content) > + # Remove trailing white spaces > + content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, > flags = > re.MULTILINE) > + # Replace '\t' with two spaces > + content = re.sub('\t'.encode(), ' '.encode(), content) > + fd = open(file, 'wb') > + fd.write(content) > + fd.close() > + print(file) > + > +if __name__ == "__main__": > + sys.exit(FormatFiles()) > \ No newline at end of file > -- > 2.8.0.windows.1 > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel