Hi All, I am closing this vote and take all what we have learned to create a new release candidate. I would like to thank everyone for their vote and insights!
Cheers, Hans On Mon, Dec 21, 2020 at 12:59 PM Matt Casters <[email protected]> wrote: > Well, we're apparently still carrying around old archived kettle code which > hasn't been ported to Hop yet. I'm in favor or getting rid of it since > it's still available elsewhere. > Same goes for the old samples. So that should clear out most of the > ignored code and files. > > https://issues.apache.org/jira/browse/HOP-2335 : remove archive-samples > https://issues.apache.org/jira/browse/HOP-2336 : Remove the > archive-pipeline-transforms folder > > On the other hand we'll be building up integration tests since we do want > to do things better than before. > These tests will indeed use very old FoxPro files to check if these .dbf > files are still being read as they should. You'd be surprised how many of > those are still around. > > https://issues.apache.org/jira/browse/HOP-2325 : .properties files > https://issues.apache.org/jira/browse/HOP-2326 : .sh and .bat files > https://issues.apache.org/jira/browse/HOP-2327 : .xml files > > Those 3 cover over 4000 files so that's that. > > So your logic makes a lot of sense. We'll continue to exclude files like > SVG and indeed Hop Pipelines and Workflows (all XML variants but considered > binary files). > > Cheers, > > Matt > > > On Sun, Dec 20, 2020 at 7:32 PM Julian Hyde <[email protected]> > wrote: > > > > > > > > On Dec 20, 2020, at 1:18 AM, Matt Casters <[email protected] > .INVALID> > > wrote: > > > > > > Thank you very much Julian. > > > I mainly wonder where on earth that font comes from since we're not > using > > > it anywhere. > > > > Yeah, fonts have a habit of sneaking in. :) > > > > > As for rat exclusions: are there any particular file formats besides > > .java > > > files that need an Apache license header? We'd be happy to add them > > > elsewhere. > > > The shell scripts perhaps as they support comments? We could even add > > them > > > to the SVG filed even though it will probably blow up memory > consumption > > > unless we code the comments out of the file loads somehow. > > > Perhaps it's easier to just look at other projects and ask which files > > need > > > a header? > > > > My preference is to put a header on pretty much any file that can have a > > header. Which in my experience is pretty much all text files, except > those > > used as test inputs or reference logs. For example, in .md files you can > > add the header inside comments that do not appear in the generated HTML. > > Shell scripts, pom files, properties files, etc. all support comments, so > > we should add headers. > > > > I agree, I would not put a header on SVG files because they are treated > as > > de facto binaries and they need to be small. > > > > I suggest that for 0.60 we pare down the RAT exclusions to the absolute > > minimum. RAT is a powerful tool if we are not holding it back! I ran RAT > > with the -debug flag and I saw lots of Java files being excluded, and > that > > was concerning. > > > > Binary files are always a problem. They are just as susceptible to > > copyright and licensing issues but are more difficult to audit. One > > strategy is to audit them one by one and add an exclusion line for each > > individual file. I know that’s a big task, so definitely not for 0.50. > > > > By the way, I ran a command to find out what kinds of files are in Hop. > > The results are interesting. There’s even one FoxPro file in there!: > > > > $ git ls-files -z | xargs -0 file -b | sort | uniq -c > > 2827 ASCII text > > 9 ASCII text, with CRLF, LF line terminators > > 47 ASCII text, with CRLF line terminators > > 3 ASCII text, with CR line terminators > > 16 ASCII text, with no line terminators > > 428 ASCII text, with very long lines > > 2 Big-endian UTF-16 Unicode text, with no line terminators > > 7 Bourne-Again shell script, ASCII text executable > > 1 Bourne-Again shell script, ASCII text executable, with very long > > lines > > 2 bzip2 compressed data, block size = 900k > > 2 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 10.0, Code page: 1252, Author: Matthias Hietland Heie, Last Saved > > By: Sergio Ribeiro, Name of Creating Application: Microsoft Excel, Create > > Time/Date: Fri Nov 17 14:48:53 2017, Last Saved Time/Date: Tue Jun 18 > > 09:34:04 2019, Security: 0 > > 2 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 10.0, Code page: 1252, Author: Sergio Ribeiro, Last Saved By: > > Sergio Ribeiro, Name of Creating Application: Microsoft Excel, Create > > Time/Date: Tue Sep 11 09:41:24 2018, Last Saved Time/Date: Tue Sep 11 > > 10:20:56 2018, Security: 0 > > 2 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 10.0, Code page: 1252, Author: Sergio Ribeiro, Last Saved By: > > Sergio Ribeiro, Name of Creating Application: Microsoft Excel, Create > > Time/Date: Tue Sep 11 09:41:24 2018, Last Saved Time/Date: Tue Sep 11 > > 10:55:49 2018, Security: 0 > > 2 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 1.0, Code page: -535, Author: JB, Revision Number: 3, Total > Editing > > Time: 02:08, Create Time/Date: Thu Oct 27 19:46:23 2011, Last Saved > > Time/Date: Thu Feb 20 09:00:44 2014 > > 2 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 5.0, Code page: 0 > > 1 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 5.0, Code page: 1252, Author: Jens Bleuel, Last Saved By: Jens > > Bleuel, Name of Creating Application: Microsoft Excel, Create Time/Date: > > Wed Aug 23 15:46:56 2006, Last Saved Time/Date: Wed Aug 23 15:56:14 2006, > > Security: 0 > > 1 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 5.1, Code page: 1252, Author: Matt Casters, Last Saved By: Matt > > Casters, Name of Creating Application: Microsoft Excel, Create Time/Date: > > Tue Sep 7 16:08:18 2010, Last Saved Time/Date: Tue Sep 7 16:15:32 2010, > > Security: 0 > > 2 Composite Document File V2 Document, Little Endian, Os: Windows, > > Version 5.1, Code page: 1252, Last Saved By: Jens Bleuel, Name of > Creating > > Application: Microsoft Excel, Create Time/Date: Thu Oct 17 06:27:31 1996, > > Last Saved Time/Date: Tue Nov 28 15:07:48 2006, Security: 0 > > 5 C source, ASCII text > > 7 C++ source, ASCII text > > 25 CSV text > > 1 data > > 3 DOS batch file, ASCII text > > 1 Embedded OpenType (EOT), icomoon family > > 1 Embedded OpenType (EOT), OpenSansLight family > > 1 Embedded OpenType (EOT), OpenSansRegular family > > 28 empty > > 9 exported SGML document, ASCII text > > 1 FoxBase+/dBase III DBF, 279 records * 52, update-date 106-7-25, > > codepage ID=0xf, at offset 161 1st record " 1das ist doch keine > > leistung 44.00hw * 2Meister 48" > > 2 GIF image data, version 89a, 16 x 16 > > 1 GIF image data, version 89a, 9 x 9 > > 1 gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), > > original size modulo 2^32 703 > > 2 gzip compressed data, was "default.csv", last modified: Wed Aug > 26 > > 08:50:54 2015, from Unix, original size modulo 2^32 67 > > 30 HTML document, ASCII text > > 1 HTML document, ASCII text, with very long lines > > 2 HTML document, UTF-8 Unicode text > > 1 ISO-8859 text > > 1 ISO-8859 text, with CR line terminators > > 3 ISO-8859 text, with very long lines > > 3179 Java source, ASCII text > > 1 Java source, ASCII text, with CRLF, LF line terminators > > 1 Java source, ASCII text, with very long lines > > 13 Java source, UTF-8 Unicode text > > 1 JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, > > segment length 16, progressive, precision 8, 400x400, components 3 > > 29 JSON data > > 2 Little-endian UTF-16 Unicode text, with CRLF line terminators > > 2 Little-endian UTF-16 Unicode text, with no line terminators > > 10 Microsoft Excel 2007+ > > 1 Microsoft OOXML > > 1 MS Windows icon resource - 1 icon, 32x32, 24 bits/pixel > > 2 MS Windows icon resource - 1 icon, 32x32, 32 bits/pixel > > 3 Non-ISO extended-ASCII text, with no line terminators > > 5 OpenDocument Spreadsheet > > 1 PNG image data, 1244 x 686, 8-bit/color RGB, non-interlaced > > 1 PNG image data, 1460 x 816, 8-bit/color RGB, non-interlaced > > 2 PNG image data, 15 x 15, 8-bit/color RGBA, non-interlaced > > 1 PNG image data, 1680 x 1050, 8-bit/color RGB, non-interlaced > > 3 PNG image data, 16 x 16, 8-bit/color RGBA, non-interlaced > > 2 PNG image data, 22 x 22, 8-bit/color RGB, non-interlaced > > 1 PNG image data, 403 x 138, 8-bit/color RGB, non-interlaced > > 4 PNG image data, 4702 x 1702, 8-bit/color RGB, non-interlaced > > 4 PNG image data, 5010 x 1990, 8-bit/color RGB, non-interlaced > > 1 PNG image data, 551 x 626, 8-bit/color RGB, non-interlaced > > 1 PNG image data, 642 x 368, 8-bit/color RGBA, non-interlaced > > 1 PNG image data, 972 x 464, 8-bit/color RGB, non-interlaced > > 3 ReStructuredText file, ASCII text > > 1 ReStructuredText file, ASCII text, with very long lines > > 2 SAS > > 654 SVG Scalable Vector Graphics image > > 1 TIFF image data, big-endian, direntries=16, height=16, bps=0, > > compression=none, PhotometricIntepretation=RGB, orientation=upper-left, > > width=16 > > 1 TrueType Font data, 11 tables, 1st "OS/2", 14 names, Macintosh, > > type 1 string, icomoon > > 1 TrueType Font data, 18 tables, 1st "FFTM", 26 names, Macintosh > > 1 TrueType Font data, 18 tables, 1st "FFTM", 30 names, Macintosh > > 2 Unicode text, UTF-32, big-endian > > 2 Unicode text, UTF-32, little-endian > > 385 UTF-8 Unicode text > > 2 UTF-8 Unicode text, with no line terminators > > 40 UTF-8 Unicode text, with very long lines > > 2 UTF-8 Unicode (with BOM) text, with no line terminators > > 1 Visual FoxPro DBF, 2 records * 205, update-date 15-10-20, at > > offset 129 1st record "value11 > > " > > 1 Web Open Font Format, TrueType, length 1168, version 1.0 > > 1 Web Open Font Format, TrueType, length 67528, version 1.10 > > 1 Web Open Font Format, TrueType, length 69392, version 1.10 > > 958 XML 1.0 document, ASCII text > > 1 XML 1.0 document, ASCII text, with CRLF, LF line terminators > > 82 XML 1.0 document, ASCII text, with very long lines > > 1 XML 1.0 document, ASCII text, with very long lines, with no line > > terminators > > 1 XML 1.0 document, UTF-8 Unicode text > > 2 XML 1.0 document, UTF-8 Unicode text, with very long lines > > 1 XML 1.0 document, UTF-8 Unicode (with BOM) text > > 2 Zip data (MIME type "application/vnd.pentaho.reporting.classic"?) > > > > Julian > > > > > > -- > Neo4j Chief Solutions Architect > *✉ *[email protected] > ☎ +32486972937 >
