Ok, I ran the script and created 211 bugs (in the ~10270 - 10500 range). As far as I can tell, all the bugs were created successfully....the script now picks up 0 tests without bugs. Ojan, I sent you the review of test_expectations: http://codereview.chromium.org/73026/show Two things I noticed: 1. The script didn't recognize layout tests in chrome/ and not in LayoutTests/ -- it's only a handful, so updating those by hand shouldn't take very long. 2. The script re-arranged some of the flags, as expected. If it is important that we keep the prior ordering of the flags, let me know.
Also, I'd love to get this script reviewed and checked in, if anyone would be willing to do a Java review. Regards, Glenn On Mon, Apr 13, 2009 at 10:19 AM, Ojan Vafai <[email protected]> wrote: > On Mon, Apr 13, 2009 at 9:35 AM, Pam Greene <[email protected]> wrote: > >> On Mon, Apr 13, 2009 at 9:09 AM, Glenn Wilson <[email protected]>wrote: >> >>> Yes, the intention is to have something maintainable that gets run as >>> part of the regular testing/merging/etc. process -- which means eventually >>> rewriting in python. I wrote this in Java because we had a java lib for >>> automatically creating bugs in the issue tracker, and I wanted to get it up >>> and running quickly. We should have a python API for doing the same thing. >>> >>> I can easily change the script so it marks DEFERs as P3s. However, >>> currently, the script doesn't alter any entries in the file that already >>> have bugs -- meaning something like "BUG1234 DEFER : ..." or "WONTFIX DEFER >>> : ..." won't change. Is this the correct behavior? >>> >> >> The end goal is to take DEFER out and let the bug priorities run the show. >> But if we're not entirely confident that this will work (both the script and >> the human tracking process afterward), it's easy enough to strip the "DEFER" >> modifiers later. >> > > Makes sense to me. Lets leave in the defer modifiers, but mark any *new* > bugs for deferred tests as P3s. The current behavior of leaving existing > bugs unaltered seems correct to me. > > BTW, doing it in Java for expediency sake is totally reasonable. > > Ojan > > >> >> - Pam >> >> >>> >>> I'm planning on running the script later today with the most recent >>> version of test_expectations.txt -- there will be 200+ new bugs. Let me >>> know if I should hold off on doing so. Ojan, I'll send you the review for >>> test_expecations.txt. >>> >>> Regards, >>> Glenn >>> >>> >>> >>> On Fri, Apr 10, 2009 at 4:36 PM, Ojan Vafai <[email protected]> wrote: >>> >>>> I hate to ask this, but any chance we could rewrite it in python? It >>>> would be a lot less code in python (one script file) and would match the >>>> rest of our codebase (which currently does all scripting in python). I'm >>>> mainly worried about usability and maintainability here. >>>> That all said, I think you should go ahead and run this script as a one >>>> off so we can get the test lists having bug IDs ASAP as that's kind of >>>> urgent to us being able to track the current state of layout tests. Then we >>>> can worry about checking in a python version later if we decide it's >>>> necessary. >>>> >>>> Pam, Darin, Jon, Mark: You might want to confirm that my request below >>>> makes sense with respect to future handling/triaging of layout tests bugs. >>>> >>>> Also for the initial one-off version, can you make any tests that are >>>> currently marked as DEFER be P3s? Then when you spit out the results to the >>>> test_expectations file, I don't think there's any need to include the DEFER >>>> anymore as we'll rely entirely on bug priorities to decide which tests need >>>> fixing for the next release (i.e. all P1 tests and maybe some P2 tests). >>>> That way there will be less initial triaging to do in order to make sure we >>>> keep the tree shippable. Eventually we'll have to go through all the P3s in >>>> the "Untriaged" state and up some of them to P2s, but there is no pressing >>>> need to do so right away. >>>> >>>> Ojan >>>> >>>> On Fri, Apr 10, 2009 at 3:42 PM, Glenn Wilson <[email protected]>wrote: >>>> >>>>> (Ack...resending) >>>>> Ok, I have the CL ready, if anyone with Java readability would be >>>>> willing to do a review, please let me know. >>>>> more inline... >>>>> >>>>> On Fri, Apr 10, 2009 at 2:07 PM, Pam Greene <[email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 9, 2009 at 6:49 PM, Ojan Vafai <[email protected]> wrote: >>>>>> >>>>>>> At a quick glance, this looks great. I didn't look over every bug, >>>>>>> but the ones I did look at look good. >>>>>> >>>>>> >>>>>> Yep. You'll want some sort of default description for the ones that >>>>>> have none. >>>>>> >>>>> >>>>> I ran the script on a small test file...an example bug is here: >>>>> http://code.google.com/p/chromium/issues/detail?id=9991 >>>>> >>>>> There's at least a small bit of indication of where it came from. I >>>>> also added the line numbers from test_expectations.txt that generated the >>>>> bug. Additionally, I also wrote the script to then add the created bug >>>>> numbers to the file. This ends up re-arranging some of the flags (DEBUG >>>>> DEFER ... etc. --> DEFER DEBUG ...), but all data is retained. >>>>> >>>>> >>>>>> 200+ bugs is certainly too many, but that's no reason not to file >>>>>> them. (Sorry, couldn't resist. Seriously, yes, definitely file them. >>>>>> Better >>>>>> in the issue tracker than getting lost.) >>>>>> >>>>> >>>>> There are probably some improvements to be had to better group some of >>>>> the bugs, but it's probably not an order of magnitude improvement. I >>>>> just >>>>> didn't want to create tons of bugs that were not useful. >>>>> >>>>> >>>>>> >>>>>> >>>>>>> It would be great to check in a version of this script that we could >>>>>>> run when a number of tests fail (e.g. when doing a bad webkit merge). >>>>>>> That >>>>>>> way, we can add them all to the local test_expectations.txt file and >>>>>>> have it >>>>>>> spit out the new results. >>>>>>> >>>>>> >>>>>> Fixing the file we have and moving forward are slightly different use >>>>>> cases, but yes. In the long run, we shouldn't need any script to fix an >>>>>> existing bug-less expectations file, only the part that adds bugs for >>>>>> newly >>>>>> added tests. I'm not sure whether that part should be controlled by >>>>>> adding >>>>>> the tests to the file and having the script file bugs, or making it a >>>>>> fully >>>>>> interactive "app": give it a list of files and a description, and it both >>>>>> changes the file and creates a bug. >>>>>> >>>>> >>>>> There may be a short-term solution and a long-term solution. The short >>>>> term solution seems to be for someone (assumedly the merger) to add the >>>>> failing tests to the file, run the script, then commit the file. In the >>>>> long term, this could be one step, or perhaps be driven right from a "roll >>>>> DEPS to new version of WebKit" script...right? >>>>> >>>>> >>>>>> >>>>>>> Really, it would be great if the script filed bugs and then just >>>>>>> modified test_expectations.txt directly (without committing it). Also, >>>>>>> the >>>>>>> script should remove any comments it moves into bug descriptions. We >>>>>>> should >>>>>>> get to a point where all the comments about layout tests are in the bug >>>>>>> descriptions themselves instead of in this file. >>>>>>> >>>>>> >>>>>> I disagree with this last part. I'd prefer a brief description to >>>>>> remain in the file, with any details in the bug. Certainly that's needed >>>>>> for >>>>>> WONTFIX bugs, where we may not have a bug since there's no work to be >>>>>> tracked, but it's helpful for others too. I've found it frustrating and >>>>>> time-consuming to track down when I see big blocks of failures with no >>>>>> explanations at all. Think of it like the svn checkin comment: enough to >>>>>> have an idea what's going on right there where you need it, with more >>>>>> detail >>>>>> in the bug for when you're really digging. >>>>>> >>>>> >>>>> Yep, I modified the script so that it will change test_expectations.txt >>>>> to have the bug number (you'd just need to commit it afterwards). It's >>>>> pretty easy to for the script to clip out the comments, but I'm reluctant >>>>> to, because the script may be over/under zealous in what comments it >>>>> associates with a layout test. Maybe I can add that behavior as another >>>>> command-line flag. >>>>> >>>>> >>>>>> >>>>>> - Pam >>>>>> >>>>>> >>>>>>> I think it would be good to get the script checked in first and then >>>>>>> run it on the existing test_expectations.txt file. >>>>>>> >>>>>>> Ojan >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 9, 2009 at 6:42 PM, Glenn Wilson <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Pam & Ojan, >>>>>>>> I wrote a script that would extract all of the layout tests from >>>>>>>> test_expectations.txt that we haven't marked as WONTFIX and don't have >>>>>>>> a bug >>>>>>>> number. I also tried a simple heuristic to get the context of the >>>>>>>> layout >>>>>>>> test via nearby comments....it's not perfect, and I'll have to change >>>>>>>> some >>>>>>>> of them by hand, but many of the merge comments are getting picked up. >>>>>>>> >>>>>>>> I've also hooked up our library for creating demetrius bugs, so >>>>>>>> getting bugs made for these should be a matter of running the script >>>>>>>> with >>>>>>>> different arguments (I hope). >>>>>>>> >>>>>>>> What are your thoughts? Is these as descriptive/accurate as we >>>>>>>> need? Is 200+ bugs too many? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Glenn >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >> >> > > > > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
